4. Statistics 4.1 Intro We estimate probability models parameters θ\thetaθ from data D\mathcal{D}D. Most methods are optimizations of the form:θ^=arg minθL(θ)\begin{equation} \hat{\theta}=\argmin_{\theta} \mathcal{L}(\theta) \end{equation}θ^=θargminL(θ) 4.2 Maximum likelihood estimation (MLE)4.3 Empirical risk minimization (ERM)4.4 Other estimation methods4.5 Regularization4.6 Bayesian statistics4.7 Frequentist statistics