I have a dataframe that looks like this:
+--------+-----+--------------------+
| uid| iid| color|
+--------+-----+--------------------+
|41344966| 1305| red|
|41344966| 1305| green|
I want to get to this as efficiently as possible:
+--------+--------------------+
| uid| recommendations|
+--------+--------------------+
|41344966| [[2174, red...|
|41345063| [[2174, green...|
|41346177| [[2996, orange...|
|41349171| [[2174, purple...|
res98: org.apache.spark.sql.Dataset[userRecs] = [uid: int, recommendations: array<struct<iid:int,color:string>>]
So I want to group records by uid into an array of objects. Each object is a class with parameters iid and color.
case class itemData (iid: Int, color: String)
case class userRecs (uid: Int, recommendations: Array[itemData])