Is there a possibility to read a GRIB2 file from HDFS into RDD via Spark API? I found JavaContext.binaryFiles
, but the returned RDD contains cryptic data (not human readable). I'm using Spark 1.6.1 and the Java API. Thank you!
String inputFile = "hdfs://hdfs:8020/data/testdata.bin";
SparkConf sparkConf = SparkConfFactory.createSparkConf("WeatherData");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaPairRDD<String, PortableDataStream> inputRdd = sc.binaryFiles(inputFile);
List<Tuple2<String, PortableDataStream>> asList = inputRdd.collect();
for(Tuple2<String, PortableDataStream> a : asList) {
System.out.println(a._1()); // Key = File path
DataInputStream in = new DataInputStream(a._2().open());
BufferedReader d = new BufferedReader(new InputStreamReader(in));
while(d.ready()) {
System.out.println(d.readLine()); // Cryptic output
}
}