Chain rule for multivariable functions

From Applied Science

With a single variable function the chain rule tells us that [math]\displaystyle{ [f(g(x))]' = g'(x)f'(g(x)) }[/math]. For multivariable functions the idea is the same, it's still a product of derivatives. Both functions have to be differentiable for the chain rule to work. Now some textbooks have a different approach here. We have essentially two cases to treat: one is [math]\displaystyle{ f(g(t),h(t)) }[/math]; the other is [math]\displaystyle{ f(g(a,b),h(t,s)) }[/math]. One of the textbooks that I follow go for a general form [math]\displaystyle{ f(\gamma(t)) }[/math], where [math]\displaystyle{ \gamma(t) }[/math] is a (vector valued) function of n variables.

I'm going to begin with the easiest case [math]\displaystyle{ f(\gamma(t)) }[/math], where [math]\displaystyle{ \gamma(t) = (x(t), y(t)) }[/math] is a vector function or a (differentiable) curve. [math]\displaystyle{ x(t) }[/math] and [math]\displaystyle{ y(t) }[/math] are both differentiable. Before moving on to calculations, notice that any change in [math]\displaystyle{ t }[/math] is going to change the value of [math]\displaystyle{ f }[/math]. Which means that [math]\displaystyle{ f }[/math] depends, indirectly, on [math]\displaystyle{ t }[/math].

An increment [math]\displaystyle{ \Delta t }[/math] is going to produce the increments [math]\displaystyle{ \Delta x }[/math] and [math]\displaystyle{ \Delta y }[/math]. As such:

[math]\displaystyle{ \Delta f = f(x + \Delta x, y + \Delta y) - f(x,y) = \frac{\partial f}{\partial x}\Delta x + \frac{\partial f}{\partial y}\Delta y + \eta r }[/math].

where [math]\displaystyle{ \eta \to 0 }[/math] with [math]\displaystyle{ r = \sqrt{\Delta x^2 + \Delta y^2} \to 0 }[/math], because [math]\displaystyle{ f }[/math] is differentiable. The last term is about error when we use a linear approximation. Now since we are differentiating in respect to [math]\displaystyle{ t }[/math]:

[math]\displaystyle{ \frac{\Delta f}{\Delta t} = \frac{\partial f}{\Delta x} \cdot \frac{\Delta x}{\Delta t} + \frac{\Delta f}{\Delta y} \cdot \frac{\Delta y}{\Delta t} \pm \eta\sqrt{\left(\frac{\Delta x}{\Delta t}\right)^2 + \left(\frac{\Delta y}{\Delta t}\right)^2} }[/math]

where the sign of the last term is positive if [math]\displaystyle{ \Delta t \gt 0 }[/math] and negative if [math]\displaystyle{ \Delta t \lt 0 }[/math]. When we take the limit, [math]\displaystyle{ \Delta t \to 0 }[/math], [math]\displaystyle{ \eta \to 0 }[/math] and the last term goes away. The resulting expression is:

[math]\displaystyle{ \frac{d}{dt}f(x(t),y(t)) = \frac{\partial f}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial t} }[/math]

Taking a closer look, the chain rule is a dot product between the gradient and another vector, [math]\displaystyle{ (x'(t), y'(t)) }[/math]. It's pretty similar to the directional derivative and that is no coincidence, because when we want to find rates of change in space, we have to have a direction.

Another way to see it:

[math]\displaystyle{ \frac{d}{dt}f(P(t)) = \nabla f \cdot P'(t) }[/math]

Where [math]\displaystyle{ P(t) = (x(t),y(t)) }[/math] and [math]\displaystyle{ P'(t) = (x'(t),y'(t)) }[/math].

A natural question arises here: what can we infer from [math]\displaystyle{ \nabla f \cdot (x'(t), y'(t)) = 0 }[/math]? From analytical geometry we know that the dot product is zero when the vectors are perpendicular. We also know that the gradient is perpendicular to a level curve. Suppose that [math]\displaystyle{ \gamma(t) }[/math] is a level curve, a circle for example. As we walk over the circle we keep the same function's level, the [math]\displaystyle{ z }[/math] coordinate in case of a function of two variables. As we know, infinitely many [math]\displaystyle{ (x,y) }[/math] pairs correspond to the same level. More than that, [math]\displaystyle{ (x'(t), y'(t)) }[/math] is tangent to the level curve.

For each point on a level curve we have a tangent vector and the gradient. This means that we have a whole set of pairs of vectors for which the dot product is zero. We have a function. The reasoning in the previous paragraph can be synthetized in the following equation:

[math]\displaystyle{ F(t) = f(x(t),y(t)) = k }[/math] for all [math]\displaystyle{ t }[/math]

When [math]\displaystyle{ \frac{d}{dt}F(t) = 0 }[/math] ?

[math]\displaystyle{ F(t) = \frac{\partial f}{\partial x}x'(t) + \frac{\partial f}{\partial y}y'(t) = \nabla f \cdot P'(t) = 0 }[/math].

Suppose that [math]\displaystyle{ P'(t) \neq 0 }[/math], this shows that the directional derivative of [math]\displaystyle{ f }[/math] in the direction of [math]\displaystyle{ \overrightarrow{u} = \frac{P'(t)}{||P'(t)||} }[/math], tangent to the level curve, is zero:

[math]\displaystyle{ D_uf = \nabla f \cdot \overrightarrow{u} = 0 }[/math]

With this we have shown that [math]\displaystyle{ f(P) }[/math] is constant when we pick up [math]\displaystyle{ (x,y) }[/math] pairs that belong to the same level curve.

The natural extension of the previous rule is about functions where each variable is a function of two or more variables: [math]\displaystyle{ f(x(s,t),y(s,t)) }[/math]. For such cases the usage of substitution comes in handy to avoid losing track of the functions and variables in the process. If we do [math]\displaystyle{ x(s,t) = u }[/math] and [math]\displaystyle{ y(s,t) = v }[/math]. Then we know how to differentiate [math]\displaystyle{ f(u,v) }[/math] from the previous rule. In turn, we already know how to differentiate [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math].

If [math]\displaystyle{ F(s,t) = f(x(s,t),y(s,t)) }[/math]

Then [math]\displaystyle{ \frac{\partial F}{\partial t} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial t} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial t} }[/math]

And [math]\displaystyle{ \frac{\partial F}{\partial s} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial s} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial s} }[/math]

One may have asked about [math]\displaystyle{ \frac{d F}{d(s,t)} }[/math]. Remember, partial derivative really means partial. There is no meaning in deriving in respect to two or more variables at the same time.