Chain rule for multivariable functions

With a single variable function the chain rule tells us that [math]\displaystyle{ [f(g(x))]' = g'(x)f'(g(x)) }[/math]. For multivariable functions the idea is the same, it's still a product of derivatives. Both functions have to be differentiable for the chain rule to work. Now some textbooks have a different approach here. We have essentially two cases to treat: one is [math]\displaystyle{ f(g(t),h(t)) }[/math]; the other is [math]\displaystyle{ f(g(a,b),h(t,s)) }[/math]. One of the textbooks that I follow go for a general form [math]\displaystyle{ f(\gamma(t)) }[/math], where [math]\displaystyle{ \gamma(t) }[/math] is a (vector valued) function of n variables.

I'm going to begin with the easiest case [math]\displaystyle{ f(\gamma(t)) }[/math], where [math]\displaystyle{ \gamma(t) = (x(t), y(t)) }[/math] is a vector function or a (differentiable) curve. [math]\displaystyle{ x(t) }[/math] and [math]\displaystyle{ y(t) }[/math] are both differentiable. Before moving on to calculations, notice that any change in [math]\displaystyle{ t }[/math] is going to change the value of [math]\displaystyle{ f }[/math]. Which means that [math]\displaystyle{ f }[/math] depends, indirectly, on [math]\displaystyle{ t }[/math].

An increment [math]\displaystyle{ \Delta t }[/math] is going to produce the increments [math]\displaystyle{ \Delta x }[/math] and [math]\displaystyle{ \Delta y }[/math]. As such:

[math]\displaystyle{ \Delta f = f(x + \Delta x, y + \Delta y) - f(x,y) = \frac{\partial f}{\partial x}\Delta x + \frac{\partial f}{\partial y}\Delta y + \eta r }[/math].

where [math]\displaystyle{ \eta \to 0 }[/math] with [math]\displaystyle{ r = \sqrt{\Delta x^2 + \Delta y^2} \to 0 }[/math], because [math]\displaystyle{ f }[/math] is differentiable. The last term is about error when we use a linear approximation. Now since we are differentiating in respect to [math]\displaystyle{ t }[/math]:

[math]\displaystyle{ \frac{\Delta f}{\Delta t} = \frac{\partial f}{\Delta x} \cdot \frac{\Delta x}{\Delta t} + \frac{\Delta f}{\Delta y} \cdot \frac{\Delta y}{\Delta t} \pm \eta\sqrt{\left(\frac{\Delta x}{\Delta t}\right)^2 + \left(\frac{\Delta y}{\Delta t}\right)^2} }[/math]

where the sign of the last term is positive if [math]\displaystyle{ \Delta t \gt 0 }[/math] and negative if [math]\displaystyle{ \Delta t \lt 0 }[/math]. When we take the limit, [math]\displaystyle{ \Delta t \to 0 }[/math], [math]\displaystyle{ \eta \to 0 }[/math] and the last term goes away. The resulting expression is:

[math]\displaystyle{ \frac{d}{dt}f(x(t),y(t)) = \frac{\partial f}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial t} }[/math]

Taking a closer look, the chain rule is a dot product between the gradient and another vector, [math]\displaystyle{ (x'(t), y'(t)) }[/math]. It's pretty similar to the directional derivative and that is no coincidence, because when we want to find rates of change in space, we have to have a direction.

Another way to see it:

[math]\displaystyle{ \frac{d}{dt}f(P(t)) = \nabla f \cdot P'(t) }[/math]

Where [math]\displaystyle{ P(t) = (x(t),y(t)) }[/math] and [math]\displaystyle{ P'(t) = (x'(t),y'(t)) }[/math].

A natural question arises here: what can we infer from [math]\displaystyle{ \nabla f \cdot (x'(t), y'(t)) = 0 }[/math]? From analytical geometry we know that the dot product is zero when the vectors are perpendicular. We also know that the gradient is perpendicular to a level curve. Suppose that [math]\displaystyle{ \gamma(t) }[/math] is a level curve, a circle for example. As we walk over the circle we keep the same function's level, the [math]\displaystyle{ z }[/math] coordinate in case of a function of two variables. As we know, infinitely many [math]\displaystyle{ (x,y) }[/math] pairs correspond to the same level. More than that, [math]\displaystyle{ (x'(t), y'(t)) }[/math] is tangent to the level curve.

For each point on a level curve we have a tangent vector and the gradient. This means that we have a whole set of pairs of vectors for which the dot product is zero. We have a function. The reasoning in the previous paragraph can be synthetized in the following equation:

[math]\displaystyle{ F(t) = f(x(t),y(t)) = k }[/math] for all [math]\displaystyle{ t }[/math]

When [math]\displaystyle{ \frac{d}{dt}F(t) = 0 }[/math] ?

[math]\displaystyle{ F(t) = \frac{\partial f}{\partial x}x'(t) + \frac{\partial f}{\partial y}y'(t) = \nabla f \cdot P'(t) = 0 }[/math].

Suppose that [math]\displaystyle{ P'(t) \neq 0 }[/math], this shows that the directional derivative of [math]\displaystyle{ f }[/math] in the direction of [math]\displaystyle{ \overrightarrow{u} = \frac{P'(t)}{||P'(t)||} }[/math], tangent to the level curve, is zero:

[math]\displaystyle{ D_uf = \nabla f \cdot \overrightarrow{u} = 0 }[/math]

With this we have shown that [math]\displaystyle{ f(P) }[/math] is constant when we pick up [math]\displaystyle{ (x,y) }[/math] pairs that belong to the same level curve.

The natural extension of the previous rule is about functions where each variable is a function of two or more variables: [math]\displaystyle{ f(x(s,t),y(s,t)) }[/math]. For such cases the usage of substitution comes in handy to avoid losing track of the functions and variables in the process. If we do [math]\displaystyle{ x(s,t) = u }[/math] and [math]\displaystyle{ y(s,t) = v }[/math]. Then we know how to differentiate [math]\displaystyle{ f(u,v) }[/math] from the previous rule. In turn, we already know how to differentiate [math]\displaystyle{ u }[/math] and [math]\displaystyle{ v }[/math].

If [math]\displaystyle{ F(s,t) = f(x(s,t),y(s,t)) }[/math]

Then [math]\displaystyle{ \frac{\partial F}{\partial t} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial t} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial t} }[/math]

And [math]\displaystyle{ \frac{\partial F}{\partial s} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial s} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial s} }[/math]

One may have asked about [math]\displaystyle{ \frac{d F}{d(s,t)} }[/math]. Remember, partial derivative really means partial. There is no meaning in deriving in respect to two or more variables at the same time.

Anonymous

Search

Chain rule for multivariable functions

Namespaces

More

Page actions

Navigation

English

Português

Wiki tools

Wiki tools

Anonymous

Search

Chain rule for multivariable functions

Navigation

Wiki tools

Page tools