13.1 Introduction
Linear models make the strong assumption of linear relationships between inputs and outputs.
A simple way of increasing the flexibility of linear models is to perform feature transformation by replacing by . For example, polynomial extension in 1d uses
The model now becomes:
This is still linear in the parameters , which makes the fitting easy since the NLL is convex, but specifying the feature transform manually is very limiting.
A natural solution is to endow the feature extractor with its own parameters:
where .
We can repeat this process recursively to create more complex patterns:
where is the function at level .
Deep neural networks (DNN) encompass a large family of models in which we compose differentiable functions in DAG (direct acyclic graph).
The layers above are the simplest example where the DAG is a chain: this is called Feed Foward Neural Net (FFNN) or multilayer perceptron (MLP).
An MLP assumes the input is a fixed-dimensional vector , often called tabular data or structured data, since the data is stored into a design matrix, in which each column has a specific meaning (age, length, etc).
Other kinds of DNNs are more suited to unstructured data (text, image), where each element (pixel or word) is meaningless alone. Convolutional neural networks (CNN) are historically performant on images, transformers on sequences, and graph neural networks (GNN) on graphs.