0

So, I want to make RecSystem on KNN.

Loaded the Data

movies = pd.read_csv('Movielens/ml-small/movies.csv')
ratings = pd.read_csv('Movielens/ml-small/ratings.csv')

deleted unique val of the target value "genres"

mv = movies[movies['genres'].duplicated(keep=False)]

merged

moviesdf = ratings.merge(mv, on='movieId')
moviesdf = moviesdf[['userId', 'movieId',
                     'title', 'genres', 'rating', 'ratedate']]

Assigned target and features

y = moviesdf.genres
X = moviesdf.drop(['genres', 'title'], axis=1)

Encoded features and target

from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder()
X[["ratedate"]] = encoder.fit_transform(X[["ratedate"]])
print(type(X))

from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y) 
print(type(y_encoded))
y_encoded

Tried to split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, random_state = 100, test_size = 0.25, stratify = y)
print('X_train_shape: ' + str(X_train.shape) + '\nX_test_shape: ' + str(X_test.shape)\
       + '\ny_train_shape: ' + str(y_train.shape) + '\ny_test_shape: ' + str(y_test.shape))

Gives error

"ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2."

link to csvs: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip

Inoue
  • 21
  • 3

0 Answers0