0

My text file has got the below data:

10,14,16,19,52
08,09,12,20,45
55,56,70,78,53

I want to sort each row in a descending order. I have tried the below code

val file = sc.textFile("Maximum values").map(x=>x.split(","))
val sorted = file.sortBy(x=> -x(2).toInt)
sorted.collect()

I got the below output

[[55, 56, 70, 78, 53], [10, 14, 16, 19, 52], [08, 09, 12, 20, 45]]

The above result shows that the entire list has been sorted in the descending order.But I'm looking to sort each and every value in descending order

E.g

[10,14,16,19,52],[08,09,12,20,45],[55,56,70,78,53]

should be

[52,19,16,14,10],[45,20,12,09,08],[78,70,56,55,53]

Please spare sometime to answer this.Thanks in advance.

Prasad Khode
  • 6,602
  • 11
  • 44
  • 59

3 Answers3

0

Here is one way (untested)

val reverseStringOrdering = Ordering[String].reverse
val file = sc.textFile("Maximum values").map(x=>x.split(",").sorted(reverseStringOrdering))
val sorted = file.sortBy(r => r, ascending = false)
sorted.collect()
Terry Dactyl
  • 1,839
  • 12
  • 21
  • Thank you very much.But the sortBy function requires implicit Ordering to be defined.So the I have just added it and the perfect code looks like the one below. val reverseStringOrdering = Ordering[String].reverse val file = sc.textFile("/user/rahimenzo4891/Datasets/Maximum values").map(x=>x.split(",").sorted(reverseStringOrdering)) val sorted = file.sortBy(r => r(1),ascending = true) sorted.collect() – abdul rahim Sep 28 '18 at 12:09
  • If you sort by element 1, i.e r(1) you are not guaranteeing your lists are sorted in the correct order. – Terry Dactyl Sep 28 '18 at 13:19
0

Spark SQL way,

import org.apache.spark.sql.functions._
val df = Seq(
 ("10","14","16","19","52"),
 ("08","09","12","20","45"),
 ("55","56","70","78","53")).toDF("C1", "C2","C3","C4","C5")

 df.withColumn("sortedCol", sort_array(array("C1", "C2","C3","C4","C5"), false))
  .select("sortedCol")     
  .show()

Output

+--------------------+
|           sortedCol|
+--------------------+
|[52, 19, 16, 14, 10]|
|[45, 20, 12, 09, 08]|
|[78, 70, 56, 55, 53]|
+--------------------+
Karthick
  • 662
  • 5
  • 14
0

check this.

val file = spark.sparkContext.textFile("in/sort.dat").map( x=> { val y = x.split(','); y.sorted.reverse.mkString(",") }  )
file.collect.foreach(println)

EDIT1: how different methods apply to the above code.

scala> val a = "10,14,16,19,52"
a: String = 10,14,16,19,52

scala> val b = a.split(',')
b: Array[String] = Array(10, 14, 16, 19, 52)

scala> b.sorted
res0: Array[String] = Array(10, 14, 16, 19, 52)

scala> b.sorted.reverse
res1: Array[String] = Array(52, 19, 16, 14, 10)

scala> b.sorted.reverse.mkString(",")
res2: String = 52,19,16,14,10

scala> b.sorted.reverse.mkString("*")
res3: String = 52*19*16*14*10

scala>

EDIT2:

val file = spark.sparkContext.textFile("in/sort.dat").map( x=> { val y = x.split(',').map(_.toInt); y.sorted.reverse.mkString(",") }  )
file.collect.foreach(println)
stack0114106
  • 8,534
  • 3
  • 13
  • 38
  • I'm a beginner with spark and scala,I would be very happy if you can explain me the usage of delimiter "," on variable 'y'. i.e y.sorted.reverse.mkString(",") – abdul rahim Sep 28 '18 at 12:14
  • 'y' will be Array of String. when you sort using "sorted", it does alphabetically, so you get smallest to biggest in the Array. So, reverse the Array using "reverse" method and mkString just concatenates all the Array items using the delimiter that you specify which is comma here. I added an "EDIT1" in the answer to show the results in REPL. – stack0114106 Sep 28 '18 at 12:52
  • if you have a line like "5,18,26,72,61" then it will sort as "72,61,5,26,18" .. So for Integer sorting, then you have to cast them to Integers after the split. see my EDIT2 – stack0114106 Sep 28 '18 at 13:17