pyspark/pandas dataframe crosstab function limits

Asked Mar 26 '17 at 21:05

Active Mar 26 '17 at 21:05

Viewed 558 times

I have a csv file with 600000 unique user_ids and 70000 unique_products. I plan to implement a ALS based recommender system and for that I am planning to apply crosstab function on user_ids and unique_products. It creates memory error in pandas and 1e4 limit error in pyspark. Can anyone suggest me a way to solve this problem. Or I am also happy to get some suggestions on how to use the data for implementing a recommender system using ALS method. Thanks in Advance

asked Mar 26 '17 at 21:05

Karthick

You probably need to use a sparse matrix. Maybe this question can help: http://stackoverflow.com/questions/38134370/how-to-build-a-sparse-matrix-in-pyspark – gereleth Mar 27 '17 at 07:36
Thanks, sparse matrix was the solution. – Karthick Mar 28 '17 at 16:23

pyspark/pandas dataframe crosstab function limits

0 Answers0