Vincent MaladiereProba MLProba MLApply Conf 2022Apply Conf 2022Contributing to scikit-learnContributing to scikit-learnOpenCV TutorialOpenCV TutorialAboutAboutContactContact
  • Home
  • I. Foundation
    • 4.1 Intro
    • 4.2 Maximum Likelihood Estimation
    • 4.3 Empirical Risk Miminization
    • 4.4 Other Estimation Methods
    • 4.5 Regularization
    • 4.6 Bayesian Statistics
    • 4.7 Frequentist Statistics
    • 5.1 Bayesian Decision Theory
    • 5.2 Choosing the Right Model
    • 5.3 Frequentist Decision Theory
    • 5.4 Empirical Risk Minimization
    • 5.5 Frequentist Hypothesis Testing
    • 7.1 Intro
    • 7.2 Matrix Multiplication
    • 7.3 Matrix Inversion
    • 7.4 Eigenvalue Decomposition (EVD)
    • 7.5 Singular Valye Decomposition (SVD)
    • 7.6 Other Matrix Decompositions
    • 7.7 Solving Systems of Linear Equation
    • 7.8 Matrix Calculus
    • 8.1 Intro
    • 8.2 First-Order Methods
    • 8.3 Second-Order Methods
    • 8.4 Stochastic Gradient Descent
    • 8.5 Constrained Optimization
    • 8.6 Proximal Gradient Methods
    • 8.7 Bound Optimization
  • II. Linear Models
    • 9.1 Intro
    • 9.2 Gaussian Discriminant Analysis
    • 9.3 Naive Bayes Classifiers
    • 9.4 Generative vs Discriminative Classifiers
    • 10.1 Intro
    • 10.2 Binary Logistic Regression
    • 10.3 Multinomial Logistic Regression
    • 10.4 Robust Logistic Regression
    • 11.1 Intro
    • 11.2 Least Squares Linear Regression
    • 11.3 Ridge Regression
    • 11.4 Lasso Regression
    • 11.5 Regression Splines
    • 11.6 Robust Linear Regression
    • 12.1 Intro
    • 12.2 Examples
    • 12.4 Maximum Likelihood Estimation
    • 12.5 Worked Example Insurance Claims
  • III. Deep Neural Networks
    • 13.1 Intro
    • 13.2 Multilayer Perceptron
    • 13.3 Backpropagation
    • 13.4 Training Neural Networks
    • 13.5 Regularization
    • 13.6 Other kinds of feedforward networks
    • 14.1 Intro
    • 14.2 Common Layers
    • 14.3 Common Architectures
    • 14.4 Other Forms of Convolution
    • 14.5 Other Discriminative Vision Tasks
    • 14.6 Generating Images
    • 15.2 Recurrent Neural Networks (RNN)
    • 15.3 1d CNNs
    • 15.4 Attention
    • 15.5 Transformers
    • 15.6 Efficient Transformers
    • 15.7 Language Models
  • IV. Non Parametric Models
    • 16. Intro
    • 16.1 K Nearest-Neighbor (KNN) Classification
    • 16.2 Kernel Density Estimation
    • 16.3 Learning Distance Metrics
    • 17. Intro
    • 17.1 Mercel Kernels
    • 17.2 Gaussian Processes
    • 17.3 Support vector machines (SVM)
    • 17.4 Sparse Vector Machines
    • 18.1 Classification and regression tree (CART)
    • 18.2 Ensemble Learning
    • 18.3 Bagging
    • 18.4 Random Forests
    • 18.5 Boosting
    • 18.6 Interpreting Tree Ensembles
  • V. Beyond Supervised Learning
    • 19. Intro
    • 19.1 Data Augmentation
    • 19.2 Transfer Learning
    • 19.3 Semi Supervised Learning
    • 19.4 Active Learning
    • 19.5 Meta Learning
    • 19.6 Few Shots Learning
    • 19.7 Weakly Supervised Learning
    • 20. Intro
    • 20.1 Principal Component Analysis (PCA)
    • 20.2 Factor Analysis
    • 20.3 Autoencoders
    • 20.4 Manifold Learning
    • 20.5 Word Embeddings
    • 21.1 Intro
    • 21.2 Hierarchical Agglomerative Clustering (HAC)
    • 21.3 K-Means Clustering
    • 21.4 Clustering using Mixtures Models
    • 21.5 Spectral Clustering
    • 21.6 Biclustering
    • 22. Intro
    • 22.1 Explicit Feedback
    • 22.2 Implicit Feedback
    • 22.3 Leveraging Side Information
    • 23.1 Intro
    • 23.2 Graph Embedding as an Encoder-Decoder Problem
    • 23.3 Shallow Graph Embedding
    • 23.4 Graph Neural Nets
    • 23.5 Deep Graph Embeddings
    • 23.6 Applications
Question? Give us feedback → (opens in a new tab)Edit this page
Proba ML
11. Linear Regression
11.1 Intro

11. Linear Regression

The key property of linear regression is the expected value of the output is assumed to be a linear function of the input:

E[y∣x]=w⊤x\mathbb{E}[y|\bold{x}]=\bold{w}^\top \bold{x}E[y∣x]=w⊤x
10.4 Robust Logistic Regression11.2 Least Squares Linear Regression