Movie Recomendations with Matrix Completion
===========================================

Consider a matrix where the rows are Netflix users, the columns are
movies, and the entries are the ratings each user gives each movie. This
matrix is going to be very sparesly filled in, because most people are
going to watch just a small fraction of the movies on netflix, but if we
can predict entries of the matrix, then we could recomend movies someone
hasn’t seen yet.

We’re going to assume that the data is approximately *low-rank*, which
means that each column can be approximated with a linear combination of
just a handful of other columns. Let’s take the movies Breakfast Club
and Pretty in Pink as an example. I would bet that the way individuals
rate these two movies is highly correlated, and the columns associated
with each movie should be very similiar. Now lets throw Titanic into the
mix. While I wouldn’t expect it to be the same, it might be similiar. It
might also be similiar to other period pieces featuring forbidden love,
like Pride and Prejudice, or movies with Leonardo DeCaprio, like Wolf of
Wallstreet. So, I would expect that the ratings for Titanic might look
like an average of the ratings for all of these movies. The point is
that the ratings for a specific movie should be pretty close to a linear
combination of ratings of just a few other similiar movies.

A common dataset for movie recommendations comes from MovieLens, and
though they have datasets with 25 million ratings, we’re going to stick
with 1 million for simplicity. The data can be downloaded from
grouplens.org, or with the following bash commands:

.. code:: ipython3

    !curl https://files.grouplens.org/datasets/movielens/ml-1m.zip -O
    !unzip ml-1m.zip


.. parsed-literal::

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 5778k  100 5778k    0     0  3979k      0  0:00:01  0:00:01 --:--:-- 3977k
    Archive:  ml-1m.zip
       creating: ml-1m/
      inflating: ml-1m/movies.dat        
      inflating: ml-1m/ratings.dat       
      inflating: ml-1m/README            
      inflating: ml-1m/users.dat         


Read the data in with Numpy:

.. code:: ipython3

    import numpy as np
    data = np.loadtxt('ml-1m/ratings.dat',delimiter='::' )
    print(data[:][0:3])


.. parsed-literal::

    [[1.00000000e+00 1.19300000e+03 5.00000000e+00 9.78300760e+08]
     [1.00000000e+00 6.61000000e+02 3.00000000e+00 9.78302109e+08]
     [1.00000000e+00 9.14000000e+02 3.00000000e+00 9.78301968e+08]]


The first column is the user ID, the second is the movie ID, the third
is the rating (1,2,3,4, or 5), and the last is a time stamp (which we
don’t need to worry about). We want the rows of the matrix to be users,
and the columns should be movies.

Next we divide the data into training and testing sets. Note that we’re
also going to subtract 3 from each of the ratings that way the middle
value is 0.

.. code:: ipython3

    X=data[:, [0,1]].astype(int)-1
    y=data[:,2] - 3
    
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)


.. code:: ipython3

    from spalor.models import MC
    from statistics import mean
    
    mc_model=MC(n_components=5)
    mc_model.fit(X_train, y_train)
    y_predict=mc_model.predict(X_test.T)
    
    print("MAE:",mean(abs(y_test-y_predict)))
    print("Percent of predictions off my less than 1: ",np.sum(abs(y_test-y_predict)<1)/len(y_test))


.. parsed-literal::

    MAE: 0.7066785169566365
    Percent of predictions off my less than 1:  0.7507023525059737


The values of ``y_test`` are integers, so for each of the 5 ratings,
we’ll make a box plot of corresponding values of\ ``y_predict``.

.. code:: ipython3

    import seaborn as sns
    ax=sns.boxplot(x=y_test+3, y=y_predict+3)
    ax.set_ylim(-5, 10)
    ax.set_ylabel("y_test")
    ax.set_xlabel("y_predict")


.. parsed-literal::

    Text(0.5, 0, 'y_predict')


.. image:: movie_lens_mc_files/movie_lens_mc_9_1.png