Defining a function
A function is a rule that makes the association between one set of elements and another (there are more rigorous ways to define it but that definition is good enough for now). You have a set D where each of its elements is associated to another element in another set, E. The set D is called the Domain of the function and the set E is called the Range of the function. Each element in D is called the input or independent variable and each element in E is called the output or dependent variable.
In case the term dependent causes confusion when we have a constant function. That is, a function that outputs the same value whichever the input is. The constant function can only output its value if there is an input, whichever it may be. Without an input nothing can be outputted.
For two and more variables the definition is very much the same. Except that there is an additional question that may arise here: are the independent variables independent from each other? In calculus, yes. But we can very well think about many cases in which the independent variables have some dependency to each other. For all purposes of calculus, [math]\displaystyle{ f(x,y) = x + y }[/math] for example. As long as we consider all pairs [math]\displaystyle{ (x,y) }[/math] within this function's domain, any choice made for [math]\displaystyle{ x }[/math] does not depend on the choice made for [math]\displaystyle{ y }[/math] and vice-versa. In linear algebra we'd call that linear dependence.
The arrow is a fairly intuitive way of showing that elements of D are mapped to elements of E. Notice that each and every element of D must map to some element of E. If there is one that doesn't, it's not part of the domain. Notice that each [math]\displaystyle{ x }[/math] corresponds to some [math]\displaystyle{ f(x) }[/math], which means that they are all ordered pairs [math]\displaystyle{ (x, \ f(x)) }[/math]. Ordered because the arrow clearly shows that the direction does matter, you can't just randomly write them out of order because then you are ruining their relationship.
Notation: [math]\displaystyle{ f \ : D \ \to \ E }[/math]. Most of the time we don't need to write this and end up writing [math]\displaystyle{ f(x) = x^2 + 3x - 5 }[/math] because it's pretty obvious to see from the equation that represents the function which sets are D and E. When we plot the graph of a function it doesn't matter if we use [math]\displaystyle{ f(x) }[/math] or [math]\displaystyle{ y }[/math] to name the vertical axis. It's just more common to rely on [math]\displaystyle{ x, \ y }[/math] (and [math]\displaystyle{ z }[/math] for two variables).
Can the arrow be reversed? Yes, this is exactly the idea of the inverse function. However, not every function can be reversed and some processes cannot be reversed (there are many examples in physics and chemistry). The condition for a function to be reversible is that for each [math]\displaystyle{ f(x) }[/math] there is only one [math]\displaystyle{ x }[/math] such that [math]\displaystyle{ x \to f(x) }[/math]. In other words, from [math]\displaystyle{ x }[/math] we reach [math]\displaystyle{ f(x) }[/math] and from [math]\displaystyle{ f(x) }[/math] we can return to [math]\displaystyle{ x }[/math]. If this happens [math]\displaystyle{ x_1, \ x_2 \to f(x) }[/math], we simply cannot know from [math]\displaystyle{ f(x) }[/math] whether we return to [math]\displaystyle{ x_1 }[/math] or [math]\displaystyle{ x_2 }[/math]. Hence, the function cannot be reversed.
There comes a natural question about inverse functions: we don't learn the theorem at school, but we can very well at least learn what the inverse of a function means. What about multivariable functions? The concept is the same, but there comes a complication. The domain of a single (real) variable function is a set of ordered numbers. The domain of a function of two variables is a set of ordered pairs. With multivariable functions there are infinitely many ordered pairs that map to the same value. What do we do? It's not possible to explain this with a short paragraph here.
An analogy: suppose we have a machine that takes in water, energy, sugar, flavours and corn syrup to produce candies. It can only produce candies. It can't produce, say, cakes with the same input that is required to produce candies. On the other hand, if we have the same machine of this example and input salt, corn starch and strawberries it won't produce anything because it was never designed to process strawberries, salt and corn starch.
Note about terminology: In many places you see the words "parameter", "argument" and "unknown". For the most part they always mean the input variable. Parameter is often found when we have equations, specially with a geometrical meaning. Because increasing or decreasing its value makes a circle larger or changes the slope of a line for example. Now in physics, the variables often have some meaning, such as changing the rate of some variation for instance. A practical example would be equations that describe how a certain material is transparent to visible light. It can be more transparent or less transparent and that property is usually governed by a parameter. The word "argument" is most common in mathematics to refer to function's arguments. Now "unknown" is a synonym for "variable" most of the time, it's more common when we talk about solving equations or system of equations.
Codomain and image: for the most part in calculus, in english textbooks, range is a synonym for both. There seems to be some differences among textbooks and in between english and portuguese. For the definition of domain the word doesn't cause confusion. What would be the codomain? I'm going to borrow a word from psychology which is "codependency". The set of all dependent variables is contained in which set? In portuguese language textbooks it's the counterdomain (I have no idea why in portuguese the term is almost the opposite of the english term). Analogous to the psychology term, the codomain depends on the domain. Now for the term image, I have a textbook that defines that [math]\displaystyle{ (x, \ f(x)) }[/math]. [math]\displaystyle{ f(x) }[/math] is called the image of [math]\displaystyle{ x }[/math]. If we think about the word image, it gives the idea that [math]\displaystyle{ x }[/math] is what goes in the function, which processes it and produces something that we can see, the image. That would be a point in the function's graph. The set of all images and the codomain can have the same number of elements, but the image can never go over the codomain. One example is [math]\displaystyle{ f(x) = x^2 }[/math]. It's a function that maps real numbers to real numbers. The codomain is all real numbers, but the function cannot produce negative numbers, its image is [math]\displaystyle{ \mathbb{R}_*^{+} }[/math].
Physical interpretation: We are often not interested in all values of a function but just a particular range. For example: a function may represent volume in respect to something. The function itself can assume negative values. However, a negative volume doesn't exist. Another example: speed over time. Due to the laws of physics, speed can't be an infinitely large quantity, there is the upper limit of the speed of light. We can, nonetheless, plot the graph. It just won't have any physical meaning beyond the speed of light. That's where pure math comes in, where the physical meaning of a function is lost.
Functions from data points: in Calculus, most of the time, you are given the function. In many scenarios the function is unknown. What is known is the data. If the data is nicely ordered on a table, numerical patterns may be quickly noticed. Easy patterns such as periodic oscillations, increasing at a constant rate, at an exponential rate or decreasing exponentially. If an easy pattern cannot be identified, you'd have to plot the data on the Cartesian plane to see if there is any noticeable pattern. This is more or less an exercise of guessing which of the known functions seems to best approximate the data's behaviour.
In experimental physics, most of the time you already know the function. The classes aren't about trying to guess which function is it. The experiments done already give you the function and the goal of it is more about testing the limits in which that function remains a good approximation for the data that you gather. In those experiments there are tiny bits of statistics because there is some error analysis to be performed on the data.
In practical terms, data can also display bizarre behaviour, in which case one's knowledge about the phenomena is what makes those bizarre behaviours to be spotted. For example: the speed of sound in air is known. If there is a table and one point depicts an excessively high speed. It either means that the data itself is wrong due to some collection error or that somebody inserted data that shouldn't have been there. For statistics those points may be relevant, such as whether they represent an error or a spike. A spike in crime rates for instance?