I have a csv file with 600000 unique user_ids and 70000 unique_products. I plan to implement a ALS based recommender system and for that I am planning to apply crosstab function on user_ids and unique_products. It creates memory error in pandas and 1e4 limit error in pyspark. Can anyone suggest me a way to solve this problem. Or I am also happy to get some suggestions on how to use the data for implementing a recommender system using ALS method. Thanks in Advance
Asked
Active
Viewed 558 times
1
-
You probably need to use a sparse matrix. Maybe this question can help: http://stackoverflow.com/questions/38134370/how-to-build-a-sparse-matrix-in-pyspark – gereleth Mar 27 '17 at 07:36
-
Thanks, sparse matrix was the solution. – Karthick Mar 28 '17 at 16:23