12.4 Maximum likelihood estimation
GLMs can be fit similarly to logistic regression. In particular, the NLL is:
where:
and . We assume .
We can compute the gradient as follow:
where and is the inverse link function mapping the canonical parameters to the mean parameters.
In the case of logistic regression, we have .
This gradient expression can be used in SGD or other gradient methods.
The Hessian is given by:
where:
hence:
For example, in the case of logistic regression,
In general, we see that the Hessian is positive definite since , hence the NLL is convex, so the MLE for the GLM is unique (assuming for all ).