7.8 Matrix calculus
7.8.1 Derivatives
For a scalar-argument function , we define its derivative at a point the quantity:
7.8.2 Gradients
We can extend this definition to vector-argument function , by defining the partial derivative of w.r.t
where is the th unit vector.
The gradient of a function is its vector of partial derivatives:
Where the operator maps a function to another function
The point at which the gradient is evaluated is noted:
7.8.3 Directional derivative
The directional derivative measure how much changes along a direction in space :
Note that:
7.8.4 Total derivative
Suppose the function has the form , we define the total derivative w.r.t as:
Multiplying by , we get the total differential:
This represents how much changes when we change .
7.8.5 Jacobian
Consider . The Jacobian of this function is an matrix of partial derivatives:
7.8.5.1 Vector Product
The Jacobian vector product (JVP) is right multiplying by :
Similarly, the vector Jacobian product (VJP) is left multiplying by :
7.8.5.2 Composition of feature
The Jacobian of the composition of two features is obtained with the chain rule:
Let and , we have:
7.8.6 Hessian
For that is twice differentiable, the Hessian is the symmetric matrix of second partial derivatives:
The Hessian is the Jacobian of the gradient.
7.8.7 Gradients of commonly used functions
7.8.7.2 Functions that map vectors to scalars
7.8.7.3 Functions that map matrices to scalar
Quadratic forms:
Traces:
Determinants: