2

I have an application which works with JavaDStreams objects. This is a piece of code, where I compute the frequencies the words appear with.

JavaPairDStream<String, Integer> wordCounts = words.mapToPair(
      new PairFunction<String, String, Integer>() {
        @Override
        public Tuple2<String, Integer> call(String s) {
          return new Tuple2<>(s, 1);
        }
      }).reduceByKey(new Function2<Integer, Integer, Integer>() {
        @Override
        public Integer call(Integer i1, Integer i2) {
          return i1 + i2;
        }
      });

Now, if I wished to print the top N frequent elements, sorted by the Integer value, how can I do this if there's not methods like sortByKey (for JavaPairRDD)?

abaghel
  • 14,783
  • 2
  • 50
  • 66
sirdan
  • 1,018
  • 2
  • 13
  • 34
  • You can implement the method by yourself. – Wang Jun 10 '17 at 16:38
  • Yes, I thought of a workaround, but that would not allow me to work with JavaDStreams, they would just be RDDs. – sirdan Jun 10 '17 at 16:40
  • 1
    I think for streaming the data is coming continously,it's hard to sort them.But you can save them before. – Wang Jun 10 '17 at 17:00

2 Answers2

3

As you have JavaPairDStream<String, Integer> and want to sort by Integer value, you have to swap pair first.

JavaPairDStream<Integer,String> swappedPair = wordCounts.mapToPair(x -> x.swap());

Now you can sort by using transformToPair and use sortByKey function.

JavaPairDStream<Integer,String> sortedStream = swappedPair.transformToPair(
     new Function<JavaPairRDD<Integer,String>, JavaPairRDD<Integer,String>>() {
         @Override
         public JavaPairRDD<Integer,String> call(JavaPairRDD<Integer,String> jPairRDD) throws Exception {
                    return jPairRDD.sortByKey(false);
                  }
              });

sortedStream.print();
abaghel
  • 14,783
  • 2
  • 50
  • 66
0

Simplification:

  JavaPairDStream<String, Long> counts = lines.countByValue();
  JavaPairDStream<Long,String> swappedPair = counts.mapToPair(Tuple2::swap);  
  JavaPairDStream<Long,String> sortedStream = swappedPair.transformToPair(s -> s.sortByKey(false));
petertc
  • 3,607
  • 1
  • 31
  • 36