RPCA

class spalor.models.RPCA(n_components=10, sparsity=0.05)[source]

Robust Principal Component Analysis.

Simultaniously performs PCA while identifying and correcting outliers.

See the user guide <http://www.spalor.org/user_guide/rpca> for a detailed description

n_componentsint

Number of principle components to solve for, that is, the rank of the matrix to be completed. If set to a number between 0 ad 1, the parameter will be taken to be the ratio of the smallest singular value to the largest.

solver{‘lmafit’, ‘svt’, ‘alt_min’, ‘alt_proj’}, default=’lmafit’

solver to use see ../algorithms/mc_algorithms

lambdafloat, must be larger than 0, default 0.5

Regularization parameter. Only used if solver=’svt’ or ‘apgd’.

Increasing the parameter reduces overfiting, but may lead to estimaiton bias towards zero, particularly with solver=’svt’

tolfloat, default=1e-6

Stopping criteria for matrix completion solver.

d1int

Number of rows in matrix (typically, the number of samples in the dataset)

d2int

Number of columns in the matrix (typically, the number of features in the dataset)

Undarray of size (d1, n_components)

left singular vectors

Sndarray of size (n_components,)

singular values

Vndarray of size (d2, n_components)

right singular vectors. Often, these are the prinicipal component axes, or the basis

Tndarray of size (d1, n_components)

Score matrix, U*S. Often used for classification from PCA.

outliers : ndarray of size (d1,d2)

componentsndarray of size (d2, n_components)

Principal axes in feature space, representing the directions of maximum variance in the data.

Example: ``` A = np.random.randn(50, 2).dot(np.random.randn(2,30)) S = np.random.rand(*A.shape)<0.1

rpca=RPCA(n_components=2, sparsity=0.1) rpca.fit(A+S)

print(“Denoised matrix error:

“, np.linalg.norm(rpca.to_matrix()-A)/np.linalg.norm(A))

print(“Outliersm error:

“, np.linalg.norm(rpca.outliers_-S)/np.linalg.norm(S))

```

fit(M)[source]
Parameters

M (ndarray) – observed data matrix with an unknown but sparse set of outliers

fit_transform(M)[source]
Parameters

M (ndarray of size (d1,d2)) – observed data matrix with an unknown but sparse set of outliers

Returns

T

Return type

ndarray of size (d1, r)

get_covariance()[source]

Calculates an estimate of covariance matrix.

Entry (i,j) will be a the correlation between feature i and feature j. A value close to 1 is a strong postive correlatio, a value close to -1 is a strong negative correlation, and a value close to 0 is no correlation.

Returns

cov – Estimated covariance of data.

Return type

array, shape=(d2, d2)

to_matrix()[source]

Calculates the completed matrix.

Returns

  • L (ndarray of size (d1,d2)) – Low rank matrix, denoised

  • S (sparse matrix of size (d1,d2)) – Sparse outliers

transform()[source]

V is already solved for, so we just need to solve:

min U, outliers ||U*V+outliers -X ||_F^2 s.t. outliers is spart