0

Spark Version: 1.6.1

I have recently refactored our Word2Vec code to move to DataFrame based ml models, but I am having problem in serializing and loading the model locally.

I am able to successfully:

  1. Fit the dataframe and create the model.
  2. Retrieve synonyms.

When I try to serialize the model locally, vectors are not serialized and hence the size of the file is too small approx 2K for 10GB of data.

        FileOutputStream fo = new FileOutputStream("/tmp/word2vec");
        ObjectOutputStream so = new ObjectOutputStream(fo);
        so.writeObject(word2VecModel);
        so.flush();
        so.close();
        logger.info("Word2Vec model saved");

On loading the model and calling the findSynonyms() function results in below exception:

java.lang.NullPointerException at org.apache.spark.ml.feature.Word2VecModel.transform(Word2Vec.scala:224)

Is there a way to save the model locally ?

skgemini
  • 600
  • 4
  • 7

1 Answers1

0

Have you tried to use Model Persistence functionality that is included now out-of-the-box? You can either save separate model, whole pipeline, etc. I had tried that and that worked.