2

For each row of a DataFrame, I would like to extract the maximum value and put it in a new column. The example code below gives me a DataFrame ('dfmax') of each maximum value:

  val donuts = Seq((2.0, 1.50, 3.5), (4.2, 22.3, 10.8), (33.6, 2.50, 7.3))
  val df = sparkSession
    .createDataFrame(donuts)
    .toDF("col1", "col2", "col3")
  df.show()

  import sparkSession.implicits._
  val dfmax = df.map(r => r.getValuesMap[Double](df.schema.fieldNames).map(r => r._2).max)
  dfmax.show

This gives me df:

+----+----+----+
|col1|col2|col3|
+----+----+----+
| 2.0| 1.5| 3.5|
| 4.2|22.3|10.8|
|33.6| 2.5| 7.3|
+----+----+----+

and dfmax:

+-----+
|value|
+-----+
|  3.5|
| 22.3|
| 33.6|
+-----+

I would like to have these two frames combined in one table preferably using .withColumn or similar in a style like this (which I cannot get to work):

def maxValue(data: DataFrame): DataFrame = {
   val dfmax = df.map(r => r.getValuesMap[Double](df.schema.fieldNames).map(r => r._2).max)
   dfmax
}
val udfMaxValue = udf(maxValue _)
df.withColumn("max", udfMaxValue(df))
Christian
  • 991
  • 2
  • 13
  • 25

0 Answers0