13

I have a JavaRDD<Tuple2<String, String>> and need to transform it to JavaPairRDD<String, String>. Currently I am doing it by simply writing map function that just returns the input tuple as is. But I wonder if there is a better way?

YuliaSh.
  • 795
  • 1
  • 6
  • 23

5 Answers5

14

JavaPairRDD.fromJavaRDD(rdd) is one of solutions

YuliaSh.
  • 795
  • 1
  • 6
  • 23
  • JavaRDD buildingRDD = jsc.sparkContext().parallelize(listSmartBuilding); I want to iterate over this JavaRDD, could you help me. SmartBuildingNew is a POJO class.jsc is the JavaStreamingContext object – Anshul Kalra Feb 11 '16 at 02:59
4

For reverse conversion, this seems to work:

JavaRDD.fromRDD(JavaPairRDD.toRDD(rdd), rdd.classTag());
Michal Čizmazia
  • 875
  • 1
  • 8
  • 14
2

Try this example:

JavaRDD<Tuple2<Integer, String>> mutate = mutateFunction(rdd_world); //goes to a method that generates the RDD with a Tuple2 from a rdd_world RDD
JavaPairRDD<Integer,  String> pairs = JavaPairRDD.fromJavaRDD(mutate);
3xCh1_23
  • 1,491
  • 1
  • 20
  • 39
2

Try this to transform JavaRDD into JavaPairRDD. For me It is working perfectly.

JavaRDD<Sensor> sensorRdd = lines.map(new SensorData()).cache();
// transform data into javaPairRdd
JavaPairRDD<Integer, Sensor> deviceRdd = sensorRdd.mapToPair(new PairFunction<Sensor, Integer, Sensor>() {   
    public Tuple2<Integer, Sensor> call(Sensor sensor) throws Exception {
        Tuple2<Integer, Sensor>  tuple = new Tuple2<Integer, Sensor>(Integer.parseInt(sensor.getsId().trim()), sensor);
        return tuple;
    }
});
Maciej Dobrowolski
  • 11,561
  • 5
  • 45
  • 67
Rajeev Rathor
  • 1,830
  • 25
  • 20
1

Alternatively you can call mapToPair(..) on your instance of org.apache.spark.api.java.JavaRDD.

preeze
  • 1,061
  • 1
  • 12
  • 18