RPCA¶
- class spalor.models.RPCA(n_components=10, sparsity=0.05)[source]¶
Robust Principal Component Analysis.
Simultaniously performs PCA while identifying and correcting outliers.
See the user guide <http://www.spalor.org/user_guide/rpca> for a detailed description
- n_componentsint
Number of principle components to solve for, that is, the rank of the matrix to be completed. If set to a number between 0 ad 1, the parameter will be taken to be the ratio of the smallest singular value to the largest.
- solver{‘lmafit’, ‘svt’, ‘alt_min’, ‘alt_proj’}, default=’lmafit’
solver to use see ../algorithms/mc_algorithms
- lambdafloat, must be larger than 0, default 0.5
Regularization parameter. Only used if solver=’svt’ or ‘apgd’.
Increasing the parameter reduces overfiting, but may lead to estimaiton bias towards zero, particularly with solver=’svt’
- tolfloat, default=1e-6
Stopping criteria for matrix completion solver.
- d1int
Number of rows in matrix (typically, the number of samples in the dataset)
- d2int
Number of columns in the matrix (typically, the number of features in the dataset)
- Undarray of size (d1, n_components)
left singular vectors
- Sndarray of size (n_components,)
singular values
- Vndarray of size (d2, n_components)
right singular vectors. Often, these are the prinicipal component axes, or the basis
- Tndarray of size (d1, n_components)
Score matrix, U*S. Often used for classification from PCA.
outliers : ndarray of size (d1,d2)
- componentsndarray of size (d2, n_components)
Principal axes in feature space, representing the directions of maximum variance in the data.
Example: ``` A = np.random.randn(50, 2).dot(np.random.randn(2,30)) S = np.random.rand(*A.shape)<0.1
rpca=RPCA(n_components=2, sparsity=0.1) rpca.fit(A+S)
print(“Denoised matrix error:
- “, np.linalg.norm(rpca.to_matrix()-A)/np.linalg.norm(A))
print(“Outliersm error:
- “, np.linalg.norm(rpca.outliers_-S)/np.linalg.norm(S))
- fit(M)[source]¶
- Parameters
M (ndarray) – observed data matrix with an unknown but sparse set of outliers
- fit_transform(M)[source]¶
- Parameters
M (ndarray of size (d1,d2)) – observed data matrix with an unknown but sparse set of outliers
- Returns
T
- Return type
ndarray of size (d1, r)
- get_covariance()[source]¶
Calculates an estimate of covariance matrix.
Entry (i,j) will be a the correlation between feature i and feature j. A value close to 1 is a strong postive correlatio, a value close to -1 is a strong negative correlation, and a value close to 0 is no correlation.
- Returns
cov – Estimated covariance of data.
- Return type
array, shape=(d2, d2)