1

I am trying to join two pairRDDs as show below and whereas

lat1 : K,V -> k-integer , V- Double lat2 : K,V -> k-integer , V- Double

   JavaPairRDD<Integer,Tuple2<Double,Double>> latlong = lat.join(long);

Am assuming the new RDD will be K,[V1,V2] and i want to display the new RDD

And also if i want to do operations based on value, what is the way to perform

Please suggest in Spark-Java Api

P.s: I have seen many answers are in scala but my requirement is to implement in JAVa

swagath001
  • 79
  • 1
  • 4
  • 9

1 Answers1

2

From Spark documentation:

When join called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

So you are right with this assumption:

JavaPairRDD<Integer,Tuple2<Double,Double>> latlong = lat.join(long);

When you need to work with values in JavaPairRDD, you can use #mapValues() method:

Pass each value in the key-value pair RDD through a map function without changing the keys; this also retains the original RDD's partitioning.

For displaying the JavaPairRDD you can use the same output methods as usual e. g. #saveAsTextFile()


When you need to map values in (K, (V, W)) to something else like (K,V-W) you can use the mentioned mapValues() transformation:

JavaPairRDD<Integer, String> pairs = latlong.mapValues(
        new Function<Tuple2<Double, Double>, String>() {
          @Override
          public String call(Tuple2<Double, Double> value) throws Exception {
            return value._1() + "-" + value._2();
          }
        });
vanekjar
  • 2,386
  • 14
  • 23