How to find max and min simultaneously using aggregate by key in spark?

Question

I have tried this code to find out but i got error:

val keysWithValuesList = Array("1=2000", "2=1800", "2=3000", "3=2500", "4=1500")
val data = sc.parallelize(keysWithValuesList,2)
val kv = data.map(_.split("=")).map(v => (1, v(1).toInt))
val initialCount = kv.first._2
val maxi = (x: Int, y: Int) => if (x>y) x else y 
val mini = (x: Int, y: Int) => if (x>y) y else x 
val maxP = (p1: Int, p2: Int) => if (p1>p2) p1 else p2
val minP = (p1: Int, p2: Int) => if (p1>p2) p2 else p1
val max_min = kv.aggregateByKey(initialCount)((maxi,mini),(maxP,minP))

error is:-

command-2654386024166474:13: error: type mismatch;
 found   : ((Int, Int) => Int, (Int, Int) => Int)
 required: (Int, Int) => Int
val max_min = kv.aggregateByKey(initialCount)((maxi,mini),(maxP,minP))
                                              ^
command-2654386024166474:13: error: type mismatch;
 found   : ((Int, Int) => Int, (Int, Int) => Int)
 required: (Int, Int) => Int
val max_min = kv.aggregateByKey(initialCount)((maxi,mini),(maxP,minP))

Is there any other method?, please suggest

Use Dataset or at least Dataframe APIs, as they have inbuilt max and min functions. You might like to have look at https://stackoverflow.com/questions/43232363/get-min-and-max-from-a-specific-column-scala-spark-dataframe — hagarwal, Feb 21 '20 at 07:54

score 0 · Answer 1 · answered Feb 21 '20 at 08:27

It's possible to do two reduce operations at a time, but you will need to use tuples. First format your RDD to duplicate the value:

val rddMinMax = kv.map(x => (x._1, (x._2, x._2)))

Then use this function to reduce twice on each pair:

val minAndMax = ((l1: (Int, Int), l2: (Int, Int)) => (mini(l1._1, l2._1), maxi(l1._2, l2._2)))
rddMinMax.reduceByKey(minAndMax).collect()

How to find max and min simultaneously using aggregate by key in spark?

1 Answers1