I am novice to PySpark .
I am trying to perform a GroupBy operation to get the aggregated count. But I am not able to perform a groupBy based on time frequency. I need to perform "groupBy" using the fields "CAPTUREDTIME, NODE, CHANNEL, LOCATION, TACK". But in this groupBy I should group based on "hourly","daily","weekly", "monthly" using the "CAPTUREDTIME" field.
Please find the below sample data.
-----------------+------+------+--------+----------+--------------
|CAPTUREDTIME| NODE| CHANNEL | LOCATION| TACK
+-----------------+------+------+--------+----------+-------------
|20-05-09 03:06:21| PUSC_RES| SIMPLEX| NORTH_AL| UE220034
|20-05-09 04:33:04| PUSC_RES| SIMPLEX| SOUTH_AL| UE220034
|20-05-09 12:04:52| TESC_RES| SIMPLEX| NORTH_AL| UE220057
|20-05-10 04:24:09| TESC_RES| SIMPLEX| NORTH_AL| UE220057
|20-05-10 04:33:04| PUSC_RES| SIMPLEX| SOUTH_AL| UE220034
|20-04-09 10:57:48| TESC_RES| SIMPLEX| NORTH_AL| UE220057
|20-04-09 12:12:26| TESC_RES| SIMPLEX| NORTH_AL| UE220057
|20-04-09 03:26:33| PUSC_RES| SIMPLEX| NORTH_AL| UE220071
+-----------------+------+------+--------+----------+-------------
I have used the below pyspark code
df = df.groupby("CAPTUREDTIME", "NODE", "CHANNEL", "LOCATION", "TACK").agg(
func.count("TACK").alias("count")
)
How can I extend the above code to group on 'hourly','daily', 'weekly','monthly' ?
I require the output in below format(have shared sample output):
HOURLY :
|CAPTUREDTIME| NODE| CHANNEL | LOCATION| TACK| COUNT
|20-05-09 03:00:00| PUSC_RES| SIMPLEX| NORTH_AL| UE220034| 2
|20-05-09 04:00:00| PUSC_RES| SIMPLEX| SOUTH_AL| UE220034| 2
DAILY :
|CAPTUREDTIME| NODE| CHANNEL | LOCATION| TACK| COUNT
|20-05-09 00:00:00| PUSC_RES| SIMPLEX| NORTH_AL| UE220034| 1
|20-05-09 00:00:00| PUSC_RES| SIMPLEX| SOUTH_AL| UE220034| 2
|20-05-09 00:00:00| TESC_RES| SIMPLEX| NORTH_AL| UE220057| 3
WEEKLY :
|CAPTUREDTIME| NODE| CHANNEL | LOCATION| TACK| COUNT
|20-05-09 00:00:00| PUSC_RES| SIMPLEX| NORTH_AL| UE220034| 1
MONTHLY :
|CAPTUREDTIME| NODE| CHANNEL | LOCATION| TACK| COUNT
|20-05-09 00:00:00| PUSC_RES| SIMPLEX| NORTH_AL| UE220034| 1