I'm trying to create a dataframe where for each user there will be a list of all movies with binary classification, whether the user has seen this movie or not. RN I'm using MovieLens small (ZIP) and trying to get result like this:
1 2 3 4 5 6 etc
1 0 0 1 1 0 1 etc
2 1 0 0 0 1 1 etc
3 0 1 1 0 0 1 etc
Original DataFrames are like:
movieId title
0 1 Toy Story (1995)
1 2 Jumanji (1995)
2 3 Grumpier Old Men (1995)
3 4 Waiting to Exhale (1995)
userId movieId rating timestamp
0 1 1 4.0 964982703
1 1 3 4.0 964981247
2 1 6 4.0 964982224
3 1 47 5.0 964983815
4 1 50 5.0 964982931
Where indexes are user IDs and columns are movie IDs. I tried solving this problem using list comprehensions like this:
pd.DataFrame(data=[[1 if movie_id in ratings_df[ratings_df["userId"] == user_id]["movieId"] else 0 for movie_id in tqdm(range(1, last_movie + 1))] for user_id in range(1, last_user + 1)], columns=movie_columns)
But this is working way too slow.