0
from pyspark.ml.recommendation import ALS, ALSModel
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
from pyspark.mllib.evaluation import RegressionMetrics, RankingMetrics
from pyspark.ml.evaluation import RegressionEvaluator

als = ALS(maxIter=15, 
              regParam=0.08, 
              userCol="ID User", 
              itemCol="ID Film", 
              ratingCol="Rating",
              rank=20,
              numItemBlocks=30,
              numUserBlocks = 30,
              alpha = 0.95,
              nonnegative = True, 
              coldStartStrategy="drop",
             implicitPrefs=False)
model = als.fit(training_dataset)

model.save('model')

everytime i call save method the jupyter notebook gives me similar error

An error occurred while calling o477.save.
: org.apache.spark.SparkException: Job aborted.
    at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:106)

I'm aware of the previous SO question and answer and has tried this:

model.save('model')

.

model.write().save("saved_model")

.

als.write().save("saved_model")

.

als.save('model')

.

import pickle
s = pickle.dumps(als)

.

als_path = "from_C:Folder_to_my_project_root" + "/als"
als.save(als_path)

my question is how to save ALS model so that i can load it without training everytime i run the program

Michael Halim
  • 262
  • 2
  • 20
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jun 07 '22 at 11:46

2 Answers2

2

I used to run this problem where i run recommendation for netflix prize dataset with total 100 million records. This is what i did, try to run 50% of the data and slowly add the percentage and see where it breaks. In my case the data slowly add up to 100% of the data. Closing unnecesarry Chrome tab also helps

angbear
  • 31
  • 4
1

Basically, o477 and oXXX error in general means there's error while doing the jobs. since it seems you're doing a movie recommendation, i assume you use movielens or netflix dataset. it can mean one of these:

  1. File is too big and can't pickle
  2. The model is too complex and your memory runs out