I am trying to count Date
for each unique ID
in Pyspark.
+-------------------+----------+
| Date| ID|
+-------------------+----------+
|2022-03-19 00:00:00| Ax3838J|
|2022-03-11 00:00:00| Ax3838J|
|2021-11-01 00:00:00| Ax3838J|
|2021-10-27 00:00:00| Ax3838J|
|2021-10-25 00:00:00| Bz3838J|
|2021-10-22 00:00:00| Bz3838J|
|2021-10-18 00:00:00| Bz3838J|
|2021-10-15 00:00:00| Rr7422u|
|2021-09-22 00:00:00| Rr742uL|
+-------------------+----------+
When I tried
df.groupBy('ID').count('Date').show()
I got the error:
_api() takes 1 positional argument but 2 were given
which makes sense, but I am not sure what are the other techniques exits to count so in PySpark.
How do I count unique Date
values with this:
df.groupBy('ID').count().show()
Expected output:
+-------------------+----------+
| Date| ID|
+-------------------+----------+
| 4| Ax3838J|
| 3| Bz3838J|
| 2| Rr742uL|
+-------------------+----------+