Proba ML
7. Linear Algebra
7.2 Matrix Multiplication

7.2 Matrix multiplication

The product of ARm×nA\in\mathbb{R}^{m \times n} and BRn×pB\in\mathbb{R}^{n \times p} is AB=CAB=C with:

Cij=k=1nAikBkjC_{ij}=\sum_{k=1}^n A_{ik}B_{kj}

Note that the number of columns of AA must match the number of rows of BB.

Matrix multiplication complexity averages O(mnp)O(mnp), faster methods exists (like BLAS) and hardware parallelizing the computation, like TPU or GPU.

  • Matrix multiplication is associative: A(BC)=(AB)CA(BC)=(AB)C
  • Matrix multiplication is distributive: A(B+C)=AB+BCA(B+C)=AB+BC
  • In general, it is not commutative: ABBAAB\neq BA

7.2.1 Vector-vector products

Inner product

Given two vectors x,yRn\bold{x}, \bold{y}\in\mathbb{R}^n, their size is (n,1)(n,1) . The inner product is the scalar:

x,y=xy=i=1nxiyi\lang\bold{x},\bold{y}\rang=\bold{x}^\top \bold{y}=\sum_{i=1}^n x_iy_i

Note that xy=yx\bold{x}^\top\bold{y}=\bold{y}^\top\bold{x}

Outer product

Given two vectors xRn\bold{x}\in \mathbb{R}^n and yRm\bold{y}\in \mathbb{R}^m (they no longer have the same size), the outer product is a matrix Rn×m\mathbb{R}^{n\times m}:

xy=[x1y1x1ymxny1xnym]\bold{x}\bold{y}^\top=\begin{bmatrix}x_1y_1 & \dots & x_1y_m \\ \vdots &\ddots & \vdots \\ x_n y_1 & \dots & x_n y_m\end{bmatrix}

7.2.2 Vector-matrix products

Given matrix ARm×nA\in\mathbb{R}^{m\times n} and xRn\bold{x} \in \mathbb{R}^{n}, the product is y=AxRm\bold{y}=A\bold{x} \in \mathbb{R}^m

This can be viewed as inner product on rows:

y=[a1xamx]\bold{y}=\begin{bmatrix} a_1^\top \bold{x}\\ \vdots \\ a_m^\top\bold{x}\end{bmatrix}

Or linear combination on columns:

y=[a1]x1++[an]xn\bold{y}= \begin{bmatrix}| \\a_1\\|\end{bmatrix}x_1+\dots+\begin{bmatrix}| \\a_n\\|\end{bmatrix}x_n

In this latter view, A is a set of basis vectors defining a linear subspace.

7.2.3 Matrix-matrix products

The main view is a inner product between row vectors of AA and columns vectors of BB:

Screen Shot 2023-02-24 at 09.06.04.png

If ARn×mA\in\mathbb{R}^{n\times m} and BRm×dB\in\mathbb{R}^{m\times d}, then CRn×dC\in\mathbb{R}^{n\times d}

7.2.4 Matrix manipulations

Summing slices

Let XRn×dX\in\mathbb{R}^{n\times d}, we can average rows of XX by pre-multiplying a vector of ones:

xˉ=1N1NX=[i=1nxi1i=1nxid]\bar{\bold{x}}^\top=\frac{1}{N}\bold{1}_N^\top X= \begin{bmatrix}\sum_{i=1}^n x_{i1}&\dots & \sum_{i=1}^nx_{id}\end{bmatrix}

Conversely, we can average columns by post-multiplying.

Hence, the overall mean is given by:

xˉ=1ND1NX1D\bar{x}=\frac{1}{ND}\bold{1}_N^\top X \bold{1}_D

Scaling

If we pre-multiply XX by a diagonal matrix SS, we scale each row by sis_i:

Screen Shot 2023-02-24 at 09.21.55.png

Conversely, we scale columns by post-multiplying XX with the diagonal matrix SS

Therefore, the normalization operation can be written:

Xstd=(X1Nμ)diag(σ)1X^{std}=(X-\bold{1}_N\mu^\top)\mathrm{diag}(\sigma)^{-1}

where μ\mu is the empirical mean vector and σ\sigma the empirical standard deviation vector.

Scatter matrix

The sum of squares matrix S0RD×DS_0\in\mathbb{R}^{D\times D} is:

S0XX=i=1NxixiS_0\triangleq X^\top X=\sum_{i=1}^Nx_ix_i^\top

The scatter matrix SxˉRD×DS_{\bar{x}}\in\mathbb{R}^{D\times D} is:

Sxˉi=1N(xixˉ)(xixˉ)=i=1NxixiNxˉxˉS_{\bar{x}}\triangleq\sum_{i=1}^N(x_i-\bar{x})(x_i-\bar{x})^\top=\sum_{i=1}^N x_i x_i^\top-N\bar{x}\bar{x}^\top

The Gram matrix KRN×NK\in\mathbb{R}^{N\times N} is:

KXXK\triangleq XX^\top

The square pairwise distance between XRm×dX\in \mathbb{R}^{m\times d} and YRn×dY\in\mathbb{R}^{n \times d} is:

Dij=xiyj22=(xiyj)(xiyj)=xi222xiyj+yj22D_{ij}=||x_i-y_j||^2_2=(x_i-y_j)^\top(x_i-y_j)=||x_i||^2_2-2x_i^\top y_j+||y_j||^2_2

This can also we written as:

D=x^1m2XY+1ny^D= \hat{x}\bold{1}_m-2XY^\top+\bold{1}_n\hat{y}

where x^=diag(XX)\hat{x}=diag(XX^\top)

7.2.5 Kronecker products

ARm×nA\in\mathbb{R}^{m\times n} and BRp×qB\in\mathbb{R}^{p\times q}, then the Kronecker product belongs to Rmp×nq\mathbb{R}^{mp\times nq}:

AB=[a11Ba1nBam1BamnB]A \otimes B= \begin{bmatrix} a_{11}B & \dots & a_{1n}B \\ \vdots & \ddots & \vdots \\ a_{m1}B & \dots & a_{mn}B \end{bmatrix}

Some useful properties:

(AB)1=A1B1(A\otimes B)^{—1}=A^{-1}\otimes B^{-1}

(AB)vec(C)=vec(BCA)(A \otimes B)\mathrm{vec}(C)=\mathrm{vec}(BCA^\top)

7.2.6 Einstein summation

Einsum consists in removing the \sum operator for matrix and tensor products:

Cij=kAikBkjAikBkjC_{ij}=\sum_k A_{ik}B_{kj} \Rightarrow A_{ik}B_{kj}

And for more complex operations:

Lnc=dktSntkWkdVdcSntkWkdVdcL_{nc}=\sum_d\sum_k\sum_t S_{ntk}W_{kd}V_{dc} \Rightarrow S_{ntk}W_{kd}V_{dc}

Einsum can perform computations in an optimal order, minimizing time and memory allocation.