How to sort each line of a rdd in spark using scala?

Question

My text file has got the below data:

10,14,16,19,52
08,09,12,20,45
55,56,70,78,53

I want to sort each row in a descending order. I have tried the below code

val file = sc.textFile("Maximum values").map(x=>x.split(","))
val sorted = file.sortBy(x=> -x(2).toInt)
sorted.collect()

I got the below output

[[55, 56, 70, 78, 53], [10, 14, 16, 19, 52], [08, 09, 12, 20, 45]]

The above result shows that the entire list has been sorted in the descending order.But I'm looking to sort each and every value in descending order

E.g

[10,14,16,19,52],[08,09,12,20,45],[55,56,70,78,53]

should be

[52,19,16,14,10],[45,20,12,09,08],[78,70,56,55,53]

Please spare sometime to answer this.Thanks in advance.

score 0 · Answer 1 · answered Sep 27 '18 at 12:54

0

Here is one way (untested)

val reverseStringOrdering = Ordering[String].reverse
val file = sc.textFile("Maximum values").map(x=>x.split(",").sorted(reverseStringOrdering))
val sorted = file.sortBy(r => r, ascending = false)
sorted.collect()

answered Sep 27 '18 at 12:54

Terry Dactyl

1,839
12
21

Thank you very much.But the sortBy function requires implicit Ordering to be defined.So the I have just added it and the perfect code looks like the one below. val reverseStringOrdering = Ordering[String].reverse val file = sc.textFile("/user/rahimenzo4891/Datasets/Maximum values").map(x=>x.split(",").sorted(reverseStringOrdering)) val sorted = file.sortBy(r => r(1),ascending = true) sorted.collect() – abdul rahim Sep 28 '18 at 12:09
If you sort by element 1, i.e r(1) you are not guaranteeing your lists are sorted in the correct order. – Terry Dactyl Sep 28 '18 at 13:19

score 0 · Answer 2 · answered Sep 27 '18 at 13:02

Spark SQL way,

import org.apache.spark.sql.functions._
val df = Seq(
 ("10","14","16","19","52"),
 ("08","09","12","20","45"),
 ("55","56","70","78","53")).toDF("C1", "C2","C3","C4","C5")

 df.withColumn("sortedCol", sort_array(array("C1", "C2","C3","C4","C5"), false))
  .select("sortedCol")     
  .show()

Output

+--------------------+
|           sortedCol|
+--------------------+
|[52, 19, 16, 14, 10]|
|[45, 20, 12, 09, 08]|
|[78, 70, 56, 55, 53]|
+--------------------+

stack0114106 · Accepted Answer · 2018-09-28T13:19:57.480

0

check this.

val file = spark.sparkContext.textFile("in/sort.dat").map( x=> { val y = x.split(','); y.sorted.reverse.mkString(",") }  )
file.collect.foreach(println)

EDIT1: how different methods apply to the above code.

scala> val a = "10,14,16,19,52"
a: String = 10,14,16,19,52

scala> val b = a.split(',')
b: Array[String] = Array(10, 14, 16, 19, 52)

scala> b.sorted
res0: Array[String] = Array(10, 14, 16, 19, 52)

scala> b.sorted.reverse
res1: Array[String] = Array(52, 19, 16, 14, 10)

scala> b.sorted.reverse.mkString(",")
res2: String = 52,19,16,14,10

scala> b.sorted.reverse.mkString("*")
res3: String = 52*19*16*14*10

scala>

EDIT2:

val file = spark.sparkContext.textFile("in/sort.dat").map( x=> { val y = x.split(',').map(_.toInt); y.sorted.reverse.mkString(",") }  )
file.collect.foreach(println)

edited Sep 28 '18 at 13:19

answered Sep 27 '18 at 13:06

stack0114106

8,534
3
13
38

I'm a beginner with spark and scala,I would be very happy if you can explain me the usage of delimiter "," on variable 'y'. i.e y.sorted.reverse.mkString(",") – abdul rahim Sep 28 '18 at 12:14
'y' will be Array of String. when you sort using "sorted", it does alphabetically, so you get smallest to biggest in the Array. So, reverse the Array using "reverse" method and mkString just concatenates all the Array items using the delimiter that you specify which is comma here. I added an "EDIT1" in the answer to show the results in REPL. – stack0114106 Sep 28 '18 at 12:52
if you have a line like "5,18,26,72,61" then it will sort as "72,61,5,26,18" .. So for Integer sorting, then you have to cast them to Integers after the split. see my EDIT2 – stack0114106 Sep 28 '18 at 13:17

How to sort each line of a rdd in spark using scala?

3 Answers3