Proba ML
7. Linear Algebra
7.7 Solving Systems of Linear Equation

7.7 Solving systems of Linear Equation

We can formulate linear equations in the form of Ax=bA\bold{x}=\bold{b}.

If we have mm equations and nn unknowns, then AA will be a m×nm\times n matrix and b\bold{b} a m×1m\times 1 vector.

  • If m=nm=n and AA is full rank, there is a single unique solution
  • If m<nm<n the system is underdetermined so there is no unique solution
  • If m>nm>n the system is overdetermined, not all the lines intersect at the same point

Screen Shot 2023-03-28 at 08.32.56.png

7.7.1 Solving square systems

When m=nm=n, we can solve the LU decomposition for x\bold{x}.

Ax=bLUx=bUx=L1byx=U1y\begin{align} A\bold{x}&=\bold{b} \\ LU\bold{x}&=\bold{b} \\ U\bold{x}&=L^{-1}\bold{b}\triangleq\bold{y} \\ \bold{x}&=U^{-1}\bold{y} \end{align}

The point is that LL and UU are triangular, so we don’t need to compute the inverse and can use back substitution instead. For L1b=yL^{-1}\bold{b}=\bold{y}

Screen Shot 2023-03-29 at 08.21.43.png

We start by solving y1=b1/L11y_1=b_1/L_{11}, then we substitute it into L12y1+L22y2=b2L_{12}y_1+L_{22}y_2=b_2 and iterate.

Once we have y\bold{y}, we can apply the same computation for x=U1yx=U^{-1}\bold{y}.

7.7.2 Solving under-constrained systems (least norm estimation)

We assume m<nm<n and AA to be full-rank, so there are multiple solutions:

{x:Ax=b}={xp+z:znullspace(A)}\{\bold{x}:A\bold{x}=\bold{b}\}=\{\bold{x}_p+\bold{z}:\bold{z}\in\mathrm{nullspace}(A)\}

Where xp\bold{x}_p is any solution. It is standard to pick the solution minimizing the 2\ell_2 regularization:

x^=arg minxx22,  s.t.  Ax=b\hat{\bold{x}}=\argmin_{\bold{x}}||\bold{x}||^2_2,\;s.t.\;A\bold{x=b}

We can compute the minimal norm solution using the right pseudo inverse:

x=A(AA)1b\bold{x}=A^\top(AA^\top)^{-1}\bold{b}

We can also solve the constrained optimization problem by minimizing the norm:

L(x,λ)=xx+λ(Axb)xL=2x+Aλ=0λL=Axb=0\begin{align} \mathcal{L}(\bold{x},\lambda)&=\bold{x}^\top\bold{x}+\bold{\lambda}^\top (A\bold{x-b})\\ \nabla_\bold{x}\mathcal{L}&=2\bold{x} +A^\top\bold{\lambda}=\bold{0} \\ \nabla_{\lambda}\mathcal{L}&= A\bold{x}-b=\bold{0} \end{align}

Therefore:

x=12AλAx=12AAλ=bλ=2(AA)1b\begin{align} \bold{x}&=-\frac{1}{2}A^\top\lambda \\ A\bold{x}&= -\frac{1}{2}AA^\top \lambda=b \Rightarrow \lambda=-2(AA^\top)^{-1}b \end{align}

Finally we find the right pseudo inverse again:

x=A(AA)1b\bold{x}=A^\top(AA^\top)^{-1}b

7.7.3 Solving over-constrained systems (least square estimation)

If m>nm>n, we’ll try to find the closest solution satisfying all constrained specified by Ax=bA\bold{x=b}, by minimizing the least square objective:

f(x)=12Axb22f(\bold{x})=\frac{1}{2}||A\bold{x}-b||^2_2

We know that the gradient is:

g(x)=xf(x)=A(Axb)=AAxAbg(\bold{x})=\frac{\partial}{\partial \bold{x}} f(\bold{x})=A^\top(A\bold{x}-b)=A^\top A\bold{x}-A^\top b

Hence the solution is the OLS:

x^=(AA)1Ab\hat{\bold{x}}=(A^\top A)^{-1}A^\top b

With A=(AA)1AA^\dagger=(A^\top A)^{-1} A^\top the left pseudo inverse.

We can check that this solution is unique by showing that the Hessian is positive definite:

h(x)=22xf(x)=AAh(\bold{x})=\frac{\partial^2}{\partial^2\bold{x}}f(\bold{x})=A^\top A

Which is positive definite if AA is full-rank since for any v>0\bold{v}>\bold{0} we have:

vAAv=(Av)(Av)=Av2>0\bold{v}^\top A^\top A \bold{v}=(A\bold{v})^\top (A\bold{v})=||A\bold{v}||^2 >0