So, I want to make RecSystem on KNN.
Loaded the Data
movies = pd.read_csv('Movielens/ml-small/movies.csv')
ratings = pd.read_csv('Movielens/ml-small/ratings.csv')
deleted unique val of the target value "genres"
mv = movies[movies['genres'].duplicated(keep=False)]
merged
moviesdf = ratings.merge(mv, on='movieId')
moviesdf = moviesdf[['userId', 'movieId',
'title', 'genres', 'rating', 'ratedate']]
Assigned target and features
y = moviesdf.genres
X = moviesdf.drop(['genres', 'title'], axis=1)
Encoded features and target
from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder()
X[["ratedate"]] = encoder.fit_transform(X[["ratedate"]])
print(type(X))
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)
print(type(y_encoded))
y_encoded
Tried to split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, random_state = 100, test_size = 0.25, stratify = y)
print('X_train_shape: ' + str(X_train.shape) + '\nX_test_shape: ' + str(X_test.shape)\
+ '\ny_train_shape: ' + str(y_train.shape) + '\ny_test_shape: ' + str(y_test.shape))
Gives error
"ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2."
link to csvs: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip