MC

class spalor.models.MC(n_components=10, normalize=False, solver='lmafit')[source]

Matrix completion. There are two main ways to use this class:

  • PCA when some proportion of the data is missing. This class will calculate the principal components with the data available. This can be used to fill in the missing data, or the principal components and scores can be used on their own as if the data was never missing to begin with.

  • A supervised machine learning algorithm based on collaborative filtering. Typically, this is thought as a recommendation system where d1 is the number of users, d2 is the number of items, and the values are the users ratings on the items. The features are the index of the user and the item, and the target variable is the rating.

See the user guide <http://www.spalor.org/user_guide/matrix_completion> for a detailed description

n_componentsint, default = 10

Number of principle components to solve for, that is, the rank of the matrix to be completed. If set to a number between 0 ad 1, the parameter will be taken to be the ratio of the smallest singular value to the largest.

solver{‘lmafit’, ‘svt’, ‘alt_min’, ‘alt_proj’}, default=’lmafit’

solver to use see ../algorithms/mc_algorithms

normalize: (optional) bool, default: True

wether to normalize columns of X prior to fitting model

d1int

Number of rows in matrix (typically, the number of samples in the dataset)

d2int

Number of columns in the matrix (typically, the number of features in the dataset)

Undarray of size (d1, n_components)

left singular vectors

Sndarray of size (n_components,)

singular values

Vndarray of size (d2, n_components)

right singular vectors.

Tndarray of size (d1, n_components)

Score matrix, U*S. Often used for classification from PCA.

componentsndarray of size (d2, n_components)

Principal axes in feature space, representing the directions of maximum variance in the data.

``` A = np.array([[1, 1, 2, 0],

[2, 1, 3, np.nan], [1, 2, np.nan, -1]])

mc = MC(n_components=2) mc.fit(A)

print(“Full matrix:

“, mc.to_matrix())

```

``` X = np.array([[0, 0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 3, 0, 1, 2, 0, 1, 3]]) y = np.array([1, 1, 2, 0, 2, 1, 3, 1, 2, -1])

mc = MC(n_components=2) mc.fit(X, y)

print(“Full matrix:

“, mc.to_matrix())

print(“Entry (1,3): “, mc.predict(np.array([[1, 3]]).T)) print(“Entry (2,2): “, mc.predict(np.array([[2, 2]]).T)) ```

fit(X, y=None, missing_val='nan')[source]
Parameters
  • X (ndarray of size (d1,d2) or (n,2)) – either the matrix to fit with missing values, or the rows and columns where entries are known. If the second option, y is required

  • y ((optional) 1d array with length n) – known values of the matrix if X is shape (n,2)

  • missing_val ((optional) str of float, default: "nan") – if X is size (d1,d2), then missing_val is the placeholder for missing entries. If np.nan, then give the string “nan”.

Returns

Return type

MC model fit to input.

fit_transform(X, y=None)[source]

fit model and return principal components

Parameters
  • X (ndarray of size (d1,d2) or (n,2)) – either the matrix to fit with missing values, or the rows and columns where entries are known. If the second option, y is required

  • y ((optional) 1d array with length n) – known values of the matrix if X is shape (n,2)

  • missing_val ((optional) str of float, default: "nan") – if X is size (d1,d2), then missing_val is the placeholder for missing entries. If np.nan, then give the string “nan”.

Returns

Return type

ndarray of principal components, size (d1, n_components)

get_covariance()[source]

Calculates an estimate of covariance matrix.

Entry (i,j) will be a the correlation between feature i and feature j. A value close to 1 is a strong postive correlatio, a value close to -1 is a strong negative correlation, and a value close to 0 is no correlation.

Returns

cov – Estimated covariance of data.

Return type

array, shape=(d2, d2)

predict(X)[source]
Parameters

X (ndarray of size (n,2) containing pairs of indices for which to predict value of matrix) –

Returns

Return type

1d array of entried, length n

to_matrix()[source]

Calculates the completed matrix.

Warning: In some cases, this may be to large for memory. For example, when being used for recommendation systems.

Returns

M – Completed matrix

Return type

ndarray of size (d1,d2)