Spark Version: 1.6.1
I have recently refactored our Word2Vec code to move to DataFrame based ml models, but I am having problem in serializing and loading the model locally.
I am able to successfully:
- Fit the dataframe and create the model.
- Retrieve synonyms.
When I try to serialize the model locally, vectors are not serialized and hence the size of the file is too small approx 2K for 10GB of data.
FileOutputStream fo = new FileOutputStream("/tmp/word2vec");
ObjectOutputStream so = new ObjectOutputStream(fo);
so.writeObject(word2VecModel);
so.flush();
so.close();
logger.info("Word2Vec model saved");
On loading the model and calling the findSynonyms() function results in below exception:
java.lang.NullPointerException at org.apache.spark.ml.feature.Word2VecModel.transform(Word2Vec.scala:224)
Is there a way to save the model locally ?