1
JavaRDD<String> input = xyz.sc.textFile("/home/spark/Documents/XYZ");    
JavaRDD<String> infoRDD = input.mapToPair(new 
    PairFunction<String,String,String>(){
       public Tuple2<String, String> call(String x) {
           return new Tuple2<String, String>(x.substring(0, 2), x);
    }}).groupByKey(12).flatMap(new 
FlatMapFunction<Tuple2<String,Iterable<String>>, String>() {
    public Iterable<String> call(Tuple2<String, Iterable<String>> t) 
    throws Exception {
        return t._2();
    }
});

Above is my code where i am distributing data based on key to different partitions, but in some partition data of two different keys are getting stored where it is expected that data related to single key should get stored on single partition.

EXPECTED

key1(data)->partion1
key2(data)->partion2
key3(data)->partion3

timbre timbre
  • 12,648
  • 10
  • 46
  • 77
gaurav
  • 46
  • 6

0 Answers0