0

I got this "JavaPairRDD<HashSet<String>, HashMap<String, Double>>" RDD after some complicated aggregations, want to save the result to file. I believe saveAsHadoopFile is a good API to do so, but am having trouble filling in the parameters for saveAsHadoopFile(path, keyClass, valueClass, outputFormatClass, CompressionCodec). Can anyone help?

daydayup
  • 2,049
  • 5
  • 22
  • 47

1 Answers1

0

You can use the following function and later on parse it to the desired result.

rdd.saveAsTextFile ("hdfs:///complete_path_to_hdfs_file/");

but if you want to use saveAsHadoopFile API then following method can be used.

saveAsHadoopFile(complete_path_to_file, HashSet.class, HashMap.class, TextOutputFormat.class)

you can also use HadoopOutputFormat.class as the last parameter

For more information, you can refer to this link HadoopFile

Devendra Singh
  • 640
  • 6
  • 12
  • How do we write it as an Avro File? I tried `pairRdd.saveAsHadoopFile("/user/cloudera/avro/", String.class, Float.class, AvroOutputFormat.class);` and got a `NullPointerException` – Amber Jun 27 '18 at 11:26