0

Can someone help me to figure out why i'm having this error code : ValueError: n_components must be < n_features; got 10 >= 0

import pandas as pd
from scipy.sparse import csr_matrix
users = pd.read_table(open('ml-1m/users.dat', encoding = "ISO-8859-1"), sep=':', header=None, names=['user_id', 'gender', 'age', 'occupation', 'zip'])
ratings = pd.read_table(open('ml-1m/ratings.dat', encoding = "ISO-8859-1"), sep=':', header=None, names=['user_id', 'movie_id', 'rating', 'timestamp'])
movies = pd.read_table(open('ml-1m/movies.dat', encoding = "ISO-8859-1"), sep=':', header=None, names=['movie_id', 'title', 'genres'])
MovieLens = pd.merge(pd.merge(ratings, users), movies)

ratings_mtx_df = MovieLens.pivot_table(values='rating', index='user_id', columns='title', fill_value=0)
movie_index = ratings_mtx_df.columns

from sklearn.decomposition import TruncatedSVD
recom = TruncatedSVD(n_components=10, random_state=101)
R = recom.fit_transform(ratings_mtx_df.values.T)

ValueError                                Traceback (most recent call last)
<ipython-input-8-0bd6c9bda95a> in <module>()
      1 from sklearn.decomposition import TruncatedSVD
      2 recom = TruncatedSVD(n_components=10, random_state=101)
----> 3 R = recom.fit_transform(ratings_mtx_df.values.T)

C:\Users\renau\Anaconda3\lib\site-packages\sklearn\decomposition\truncated_svd.py in fit_transform(self, X, y)
    168             if k >= n_features:
    169                 raise ValueError("n_components must be < n_features;"
--> 170                                  " got %d >= %d" % (k, n_features))
    171             U, Sigma, VT = randomized_svd(X, self.n_components,
    172                                           n_iter=self.n_iter,

ValueError: n_components must be < n_features; got 10 >= 0
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • The problem is with the shape of input to that `transform` call. It's an array derived from the dataframe. Check its shape/size. Why the `csr` in title? You haven't created a sparse matrix. – hpaulj Sep 22 '17 at 07:16
  • hpaulj, can you give me more details? Sorry I don't get it... You mean this part : users = pd.read_table(open('ml-1m/users.dat', encoding = "ISO-8859-1"), sep=' –  Sep 22 '17 at 07:35
  • Focus on `ratings_mtx_df.values` – hpaulj Sep 22 '17 at 10:57

1 Answers1

0

You're trying to split your data into 10 dimensions, but as per the documentation for TruncatedSVD, the number of features (columns) in your ratings_mtx_df data needs to be greater than the number of dimensions/components you're looking to extract. Try n_components=3 (assuming you've got at least 3 features in your data) and see if that's any better.

Also, you're turning your input data sideways, with the .T argument in:

R = recom.fit_transform(ratings_mtx_df.values.T)

That may result in switching features (columns) for observations(rows) which might explain why the fit_transform method isn't working.

Thomas Kimber
  • 10,601
  • 3
  • 25
  • 42
  • I'm a newbie in Python and not sure to understand... I guess ill figure it out with some practice .. Ill post in a month! Thanks for your help –  Sep 22 '17 at 18:29