-1
val sorting = sc.parallelize(List(1,1,1,2,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,7,8,8,8,8,8))
sorting.map(x=>(x,1)).reduceByKey((a,b)=>a+b).map(x=>(x._1,"==>",x._2)).sortBy(s=>s._2,false).collect.foreach(println)    
output:
(8,==>,5)
(1,==>,3)
(2,==>,4)
(3,==>,3)
(4,==>,4)
(5,==>,3)
(6,==>,2)
(7,==>,1)

I want to show only top 3 results and remove , (comma) from the result.

mck
  • 40,932
  • 13
  • 35
  • 50
Learner
  • 33
  • 4
  • Get rid of `.map(x=>(x._1,"==>",x._2))`. Write your own separate function which prints a pair the way you want it to. Instead of `println` in the `foreach`, put your custom printing function. – Stef Dec 07 '20 at 18:48
  • See also: [Top n items from a spark dataframe rdd](https://stackoverflow.com/questions/48775083/top-n-items-from-a-spark-dataframe-rdd) – Stef Dec 07 '20 at 18:50

1 Answers1

1

use take(3) instead of collect to get the top 3 results, and then clean up the output manually:

sorting.map(x=>(x,1)).reduceByKey((a,b)=>a+b).sortBy(s=>s._2,false).map(x=>s"${x._1} ${x._2}").take(3).foreach(println)

8 5
2 4
4 4
mck
  • 40,932
  • 13
  • 35
  • 50