I have a dataset like this
id category value
1 A NaN
2 B NaN
3 A 10.5
5 A 2.0
6 B 1.0
I want to fill the NAN values with the mean of their respective category. As shown below
id category value
1 A 4.16
2 B 0.5
3 A 10.5
5 A 2.0
6 B 1.0
I tried to calculate first mean values of each category using group by
val df2 = dataFrame.groupBy(category).agg(mean(value)).rdd.map{
case r:Row => (r.getAs[String](category),r.get(1))
}.collect().toMap
println(df2)
I got map of each category and their respective mean values.output: Map(A ->4.16,B->0.5)
Now i tried update query in Sparksql to fill column but it seems spqrkSql dosnt support update query. I tried to fill null values with in dataframe but failed to do so.
What can i do? We can do the same in pandas as shown in Pandas: How to fill null values with mean of a groupby?
But how can i do using spark dataframe