-3

I've generated a model using OpenNLP and now i want to read the model in Spark (with Scala) as an RDD then use it to predict some values.

Is there a way to load other file types in Scala other than .txt, .csv, .parquet?

Thanks.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Frody
  • 163
  • 10

2 Answers2

0

What you you want to load is a model, not data. If the model you have built is serializable, you can define a global singleton object with the model and a function which does the prediction and use the function in a RRD map. For example:

object OpenNLPModel {
  val model = //load the OpenNLP model here
  def predict(s: String): String = { model.predict(s) }
}

myRdd.map(OpenNLPModel.predict)

Read the Spark programming guide for more information.

Dimosthenis
  • 971
  • 7
  • 6
0

I've just found out the answer.

public DoccatModel read(String path) throws IOException {
    Configuration conf = new Configuration();

    //Get the filesystem - HDFS
    FileSystem fs = FileSystem.get(URI.create(path), conf);
    FSDataInputStream in = null;
    DoccatModel model = null;

    try {
    //Open the path mentioned in HDFS
        in = fs.open(new Path(path));
        model = new DoccatModel(in);
    } finally {
        IOUtils.closeStream(in);
    }

    return model;
}

You have to use the FileSystem class to read a file from HDFS.

Cheers!

Frody
  • 163
  • 10