6

I have an RDD[org.joda.time.DateTime]. I would like to sort records by date in scala.

Input - sample data after applying collect() below -

res41: Array[org.joda.time.DateTime] = Array(2016-10-19T05:19:07.572Z, 2016-10-12T00:31:07.572Z, 2016-10-18T19:43:07.572Z)

Expected Output

2016-10-12T00:31:07.572Z 
2016-10-18T19:43:07.572Z   
2016-10-19T05:19:07.572Z

I have googled and checked following link but could not understand it -

How to define an Ordering in Scala?

Any help?

Community
  • 1
  • 1
r4sn4
  • 117
  • 5
  • 14

2 Answers2

9

If you collect the records of your RDD, then you can apply the following sorting:

array.sortBy(_.getMillis)

On the contrary, if your RDD is big and you do not want to collect it to the driver, you should consider:

rdd.sortBy(_.getMillis)
Anton Okolnychyi
  • 936
  • 7
  • 10
4

You can define an implicit ordering for org.joda.time.DateTime like so;

implicit def ord: Ordering[DateTime] = Ordering.by(_.getMillis)

Which looks at the milliseconds of a DateTime and sorts based on that.

You can then either ensure that the implicit is in your scope or just use it more explicitly:

arr.sorted(ord)
Ashesh
  • 2,978
  • 4
  • 27
  • 47