I have suggested a case class
way in the link that you have provided in the question. Here's something different.
RDD way
You can simply do the following
val rdd = sc.parallelize(test) //creating rdd from test
val resultRdd = rdd.groupBy(x => x._1) //grouping by the first element
.mapValues(x => x.map(y => (y._2, y._3))) //collecting the second and third element in the grouped datset
resultRdd.foreach(println)
should give you
(New York,List((Jack,jdhj)))
(Houston,List((John,dd)))
(Chicago,List((David,ff), (Andrew,ddd)))
(Detroit,List((Michael,fff), (Peter,dd), (George,dkdjkd)))
(Los Angeles,List((Tom,ff)))
Converting rdd to dataframe
If you require output in table format you can just call .toDF() after some manipulation as
val df = resultRdd.map(x => (x._1, x._2.toArray)).toDF()
df.show(false)
should give you
+-----------+--------------------------------------------+
|_1 |_2 |
+-----------+--------------------------------------------+
|New York |[[Jack,jdhj]] |
|Houston |[[John,dd]] |
|Chicago |[[David,ff], [Andrew,ddd]] |
|Detroit |[[Michael,fff], [Peter,dd], [George,dkdjkd]]|
|Los Angeles|[[Tom,ff]] |
+-----------+--------------------------------------------+