Linear approximation for one variable
Most textbooks explain the idea of finding the tangent line at a certain point of a function. The geometric idea is that if you consider a very small interval, the function can be approximated by a linear function. Some textbooks give the idea of zooming in a function's graph. If we take a parabola and zoom in enough, a small piece of it should be rendered as a straight line on a computer's screen. That's the whole geometric idea of the derivative.
With calculus we are always plotting graphs over an euclidean space. In euclidean geometry the shortest distance between two points is always a straight line. This is one reason to explain why we have the problem of finding a tangent line. Between two points we have infinitely many paths, but among all of them there is one that is a straight line and it happens to minimize the distance travelled between the two points. Not every teacher mentions this and there is also a problem of schedule. Time is often too short to teach this.
It's clear that the tangent line is a good approximation of the function if we consider a certain margin of error. The graph clearly shows that beyond a certain margin the error is too great. One way to think about it is to consider how hard it is to calculate the value of a function. It may be feasible to consider that between two points we can accept a certain margin of error and use a function that is easier or faster to calculate. In numerical methods we have a more thoughtful discussion about this because we want to answer the question "How close do we have to be from the real value? Is there a limit for the error?".
With analytical geometry we can find the equation of line if we know two points or one point and the angular coefficient. The same concept applies to finding functions that pass through points. With the derivative's definition we can find the affine function that approximates a function with a very simple algebraic manipulation:
[math]\displaystyle{ \frac{f(x) - f(p)}{x - p} = f'(p) }[/math]
[math]\displaystyle{ f(x) = f(p) + f'(p)(x - p) }[/math] (Remember that [math]\displaystyle{ x - p \neq 0 }[/math])
Notice that in the above graph there is an [math]\displaystyle{ E(x) }[/math] that represents the difference between the affine function and the function itself. Therefore:
[math]\displaystyle{ f(x) = f(p) + f'(x)(x - p) + E(x) }[/math]
The best approximation in this case happens when [math]\displaystyle{ E(x) = 0 }[/math], which means that we are as close as possible to the function we are approximating.