Robust Principle Component Analysis

While PCA is a powerful technique, it’s less reliable when just a sparse set of data points are grossly corrupted, and so the goal of RPCA is to identify and remove outliers by separating the data matrix into the sum of a low rank and sparse matrix. For example, consider the low rank matrix from the matrix completion example with a few entries changed

\[\begin{split}\begin{bmatrix} 1 &\color{purple}{\textbf{17}}& 3 & 4\\ 3 & 6 &\color{purple}{\textbf{7}}& 12 \\ 5 & 10 & 15 & \color{purple}{\textbf{2}} \\ 7 & \color{purple}{\textbf{3}} & 21 & 28 \\ \end{bmatrix} = {\begin{bmatrix} 1 & 2 & 3 & 4\\ 3 & 6 & 9 & 12 \\ 5 & 10 & 15 & 20 \\ 7 & 14 & 21 & 28 \\ \end{bmatrix}} +{ \begin{bmatrix} & -15 & & \\ & & -2& \\ & & & 18\\ & 11 & & \\ \end{bmatrix}}\end{split}\]

RPCA solves the nonconvex optimization problem:

\[\begin{split}\begin{equation} \begin{array}{ll} \underset{L,S\in \mathbb{R}^{d_1,d_2}}{\text{minimize }}& \text{rank}(L)+\lambda_0 ||S||_0\\ \text{subject to} & L+S=M \end{array} \end{equation}\end{split}\]

The RPCA class

import numpy as np
from spalor.models import RPCA
A = np.random.randn(50, 1).dot(np.random.randn(1,30))
S = np.random.rand(*A.shape)<0.1

rpca=RPCA(n_components=1, sparsity=0.1)
rpca.fit(A+S)

print("Denoised matrix error: \n", np.linalg.norm(rpca.to_matrix()-A)/np.linalg.norm(A))
print("Outliersm error: \n", np.linalg.norm(rpca.outliers_-S)/np.linalg.norm(S))
Denoised matrix error:
 4.94329075927598e-16
Outliersm error:
 4.510225048268804e-16