15

I have a DataFrame with 3 columns i.e. Id, First Name, Last Name

I want to apply GroupBy on the basis of Id and want to collect First Name, Last Name column as list.

Example :- I have a DF like this

+---+-------+--------+
|id |fName  |lName   |
+---+-------+--------+
|1  |Akash  |Sethi   |
|2  |Kunal  |Kapoor  |
|3  |Rishabh|Verma   |
|2  |Sonu   |Mehrotra|
+---+-------+--------+

and I want my output like this

+---+-------+--------+--------------------+
|id |fname           |lName               |
+---+-------+--------+--------------------+
|1  |[Akash]         |[Sethi]             |
|2  |[Kunal, Sonu]   |[Kapoor, Mehrotra]  |
|3  |[Rishabh]       |[Verma]             |
+---+-------+--------+--------------------+

Thanks in Advance

Akash Sethi
  • 2,284
  • 1
  • 20
  • 40

1 Answers1

16

You can aggregate multiple columns like this:

df.groupBy("id").agg(collect_list("fName"), collect_list("lName"))

It will give you the expected result.

himanshuIIITian
  • 5,985
  • 6
  • 50
  • 70