I've generated a model using OpenNLP and now i want to read the model in Spark (with Scala) as an RDD then use it to predict some values.
Is there a way to load other file types in Scala other than .txt, .csv, .parquet?
Thanks.
I've generated a model using OpenNLP and now i want to read the model in Spark (with Scala) as an RDD then use it to predict some values.
Is there a way to load other file types in Scala other than .txt, .csv, .parquet?
Thanks.
What you you want to load is a model, not data. If the model you have built is serializable, you can define a global singleton object with the model and a function which does the prediction and use the function in a RRD map. For example:
object OpenNLPModel {
val model = //load the OpenNLP model here
def predict(s: String): String = { model.predict(s) }
}
myRdd.map(OpenNLPModel.predict)
Read the Spark programming guide for more information.
I've just found out the answer.
public DoccatModel read(String path) throws IOException {
Configuration conf = new Configuration();
//Get the filesystem - HDFS
FileSystem fs = FileSystem.get(URI.create(path), conf);
FSDataInputStream in = null;
DoccatModel model = null;
try {
//Open the path mentioned in HDFS
in = fs.open(new Path(path));
model = new DoccatModel(in);
} finally {
IOUtils.closeStream(in);
}
return model;
}
You have to use the FileSystem class to read a file from HDFS.
Cheers!