I have a dataframe grouped by 'id' and 'type':
+---+----+-----+
| id|type|count|
+---+----+-----+
| 0| A| 2|
| 0| B| 3|
| 0| C| 1|
| 0| D| 3|
| 0| G| 1|
| 1| A| 0|
| 1| C| 1|
| 1| D| 1|
| 1| G| 2|
+---+----+-----+
I would like now to group by 'id' and get a sum of 3 largest values:
+---+-----+
| id|count|
+---+-----+
| 0| 8|
| 1| 4|
+---+-----+
How can I do it in pyspark, so that the computation is relatively efficient?
Found solution here