JavaRDD<String> input = xyz.sc.textFile("/home/spark/Documents/XYZ");
JavaRDD<String> infoRDD = input.mapToPair(new
PairFunction<String,String,String>(){
public Tuple2<String, String> call(String x) {
return new Tuple2<String, String>(x.substring(0, 2), x);
}}).groupByKey(12).flatMap(new
FlatMapFunction<Tuple2<String,Iterable<String>>, String>() {
public Iterable<String> call(Tuple2<String, Iterable<String>> t)
throws Exception {
return t._2();
}
});
Above is my code where i am distributing data based on key to different partitions, but in some partition data of two different keys are getting stored where it is expected that data related to single key should get stored on single partition.
EXPECTED
key1(data)->partion1
key2(data)->partion2
key3(data)->partion3