Proba ML
21. Clustering
21.5 Spectral Clustering

21.5 Spectral clustering

Spectral clustering is an approach based on eigenvalue analysis of a pairwise similarity matrix. It uses the eigenvectors to derive feature vectors for each data point, which are then clustered using feature-based clustering methods like K-means.

21.5.1 Normalized cuts

We start by creating an undirected weighted graph WW, where each data point is a node, and the strength of the edge ii-jj is a measure of similarity.

We only connect a node to its nearest neighbors to ensure the graph is sparse, which speeds computation.

Our goal is to find KK clusters of similar point, i.e. a graph partition into S1,,SKS_1,\dots,S_K disjoints sets of nodes so as to minimize the normalized cut:

Ncut(S1,,SK)=k=1Kcut(Sk,Sˉk)vol(Sk)\mathrm{Ncut}(S_1,\dots,S_K)=\sum_{k=1}^K\frac{\mathrm{cut}(S_k,\bar{S}_k)}{\mathrm{vol}(S_k)}

where:

cut(A,B)=12iAjBwijvol(A)=iAj=1Nwij\begin{align} \mathrm{cut}(A,B) &= \frac{1}{2}\sum_{i\in A\, j\in B} w_{ij} \\ \mathrm{vol}(A)&= \sum_{i\in A}\sum_{j=1}^N w_{ij} \end{align}

and where Sˉk=V\Sk\bar{S}_k=V\backslash S_k is the complement of SkS_k, V={1,,N}V=\{1,\dots,N\}, vol\mathrm{vol} is the total weight of set AA (the sum of the degree of its edges).

This splits the graph into KK clusters such that the nodes of each cluster are similar to each other but different from the rest.

We could formulate the Ncut problem by using ci{0,1}K\bold{c}_i\in \{0,1\}^K, where cik=1c_{ik}=1 iff the point ii belongs to the cluster kk.

Unfortunately, this is NP-hard. Below, we discuss a continuous relaxation of the problem based on eigenvectors that is easier to solve.

21.5.2 Eigenvectors of the graph Laplacian encode the clustering

The graph Laplacian is defined by:

LDWL \triangleq D- W

where WW is the symmetric weight matrix of the graph and D=diag(di)D=\mathrm{diag}(d_i) is the diagonal matrix containing the weight degree of each node, di=jwijd_i=\sum_{j}w_{ij}.

In practice, it is important to account for the normalized graph Laplacian, to account for the fact that some nodes are more highly connected than others. One way to do it is to create the symmetry matrix:

Lsym=D1/2LD1/2=ID1/2WD1/2L_{sym}=D^{-1/2}LD^{-1/2}=I-D^{-1/2}WD^{-1/2}

The algorithm is the following:

  • Find the smallest KK eigenvectors of LsymL_{sym}
  • Stack them into a matrix URN×KU\in \R^{N\times K}
  • Normalize each row to unit norm tij=uij/kuik2t_{ij}=u_{ij}/\sqrt{\sum_{k}u_{ik}^2} to make the matrix TT
  • Cluster the matrix TT using K-means, then infer the partition of the original point

21.5.3 Example

Screen Shot 2023-11-24 at 09.19.04.png

We see that for nested circles, K-means does a poor job, since it assumes each cluster corresponds to a spherical Gaussian.

Next, we compute a dense similarity matrix WW using a Gaussian kernel:

Wij=exp(12σ2xixj22)W_{ij}=\exp(-\frac{1}{2\sigma^2}||\bold{x}_i-\bold{x}_j||^2_2)

We then compute the first two eigenvectors of the normalized Laplacian LsymL_{sym}. From this we infer the clustering using K-means with K=2K=2.

21.5.4 Connection with other methods

21.5.4.1 Spectral clustering and kPCA

Kernel PCA uses the largest eigenvectors of WW, which are equivalent to the smallest eigenvectors of IWI-W.

In practice, spectral clustering tends to give better results than kPCA.