Proba ML
12. Generalized Linear Models
12.1 Intro

12.1 Introduction

We previously discussed:

  • the logistic regression model p(yx,w)=Ber(σ(xw))p(y|\bold{x},\bold{w})=\mathrm{Ber}(\sigma(\bold{x}^\top \bold{w}))
  • the linear regression model p(yx,w)=N(yxw,σ2)p(y|\bold{x},\bold{w})=\mathcal{N}(y|\bold{x}^\top \bold{w},\sigma^2).

For both of the models, the mean of the output E[yx,w]\mathbb{E}[y|\bold{x},\bold{w}] is a linear function of the input x\bold{x}.

Both models belong to the broader family of generalized linear models (GLM).

A GLM is a conditional version of an exponential family distribution, in which the natural parameters are a linear function of the input:

p(ynxn,w,σ2)=exp[ynηnA(ηn)σ2+logh(yn,σ2)]p(y_n|\bold{x}_n,\bold{w},\sigma^2)=\exp \Big[\frac{y_n\eta_n-A(\eta_n)}{\sigma^2} +\log h(y_n,\sigma^2)\Big]

where:

  • ηnwxn\eta_n\triangleq \bold{w}^\top \bold{x}_n is the (input dependent) natural parameter
  • A(ηn)A(\eta_n) is the log normalizer
  • y=T(y)y=\mathcal{T}(y) is the sufficient statistic
  • σ2\sigma^2 is the dispersion term

We denote the mapping from the linear inputs to the mean of the output using μn=1(ηn)\mu_n=\ell^{-1}(\eta_n), known as the mean function, where \ell is the link function.

E[ynxn,w,σ2]=A(ηn)1(ηn)V[ynxn,w,σ2]=A(ηn)σ2\begin{align} \mathbb{E}[y_n|\bold{x}_n,\bold{w},\sigma^2] &=A'(\eta_n) \triangleq \ell^{-1}(\eta_n) \\ \mathbb{V}[y_n|\bold{x_n},\bold{w},\sigma^2] &= A''(\eta_n)\sigma^2 \end{align}