How to resample a df with multiple columns

Question

I have a minute data for multiple requests. I would like to resample it to hourly and groupby the request so that i can get the total number of requests per hour

this is how the data looks like

    | RequestDate | Request | Count |
    | 2021-11-24 22:09:00 | Request 1 | 10 |
    | 2021-11-24 22:09:00 | Request 3 | 1 |
    | 2021-11-24 22:09:00 | Request 2 | 4 |
    | 2021-11-24 22:09:00 | Request 4 | 5 |
    | 2021-11-24 22:10:00 | Request 1 | 4 |
    | 2021-11-24 22:10:00 | Request 2 | 0 |
    | 2021-11-24 22:10:00 | Request 3 | 6 |
    | 2021-11-24 22:10:00 | Request 4 | 5 |
    | 2021-11-24 22:10:00 | Request 5 | 1 |

Output:

    | RequestDate | Request | Count |
    | 2021-11-24 22:00:00 | Request 1 | 14 |
    | 2021-11-24 22:00:00 | Request 2 | 4 |
    | 2021-11-24 22:00:00 | Request 3 | 7 |
    | 2021-11-24 22:00:00 | Request 4 | 10 |
    | 2021-11-24 22:00:00 | Request 5 | 1 |

I tried this but ended in an error:

    df_groupby = df.groupby(by=[df["RequestDate"].resample('h'), "Request"])
    
    df_groupby["Request"]
    
    KeyError: 'RequestDate'

df with test data can be created as follows

df = pd.read_csv("test_data.csv")

test_data.csv

RequestDate,Request,RequestCount
2021-11-18 00:00:00,Request1,4
2022-11-18 00:00:00,Request2,4
2022-11-18 00:00:00,Request3,4
2022-11-18 00:00:00,/Request4,4
2022-11-18 00:00:00,Request5,4
2021-11-18 00:01:00,Request1,4
2021-11-18 00:02:00,Request1,2
2021-11-18 00:03:00,Request2,3
2022-11-18 00:04:00,Request3,4
2021-11-18 00:05:00,Request1,4

Please, provide a code example to create the dataframe. It is really easy, this can be made in 3 steps. Remember to write code examples with the code format in the post: {} symbol. — RobertoT, Nov 27 '21 at 19:24
@RobertoT - i have added the the code example for creating the test df — va0pjvbnd, Nov 27 '21 at 21:17

score 0 · Accepted Answer · answered Nov 27 '21 at 21:46

Your groupby line is missing a some key brackets\parens, I think.

The resample() method is something that gets applied to a groupby object, and so it should come after the groupby() is completely closed. That is, in general it would be df.groupby(...).resample(...), but you have the resample inside the groupby.

For the resample, you also need the RequestDate column to be your index (it may be in your example, but if not you need to do df = df.set_index('RequestDate')). Then you do the groupby on the Request column.

The RequestDate also needs to be saved as a datetime if it isn't already

All in all, I think you can get what you want like this:

df['RequestDate'] = pd.to_datetime(df['RequestDate']) # makes the column a datetime column, needed for resample
df = df.set_index('RequestDate')
df_grp = df.groupby('Request').resample('h').sum()
print(df_grp)

This gives the below. You'll notice that the columns aren't in the order you originally wanted: It looks like Request and RequestDate are set us as the multi-index. I'm sure you can tinker with rearranging them if need be, or go with this.

Request RequestDate Count
 Request 1  2021-11-24 22:00:00 14
 Request 2  2021-11-24 22:00:00 4
 Request 3  2021-11-24 22:00:00 7
 Request 4  2021-11-24 22:00:00 10
 Request 5  2021-11-24 22:00:00 1

Thank you @scotscotmcc. When i used this i realized that the actual data had few dates in 2022 and it was showing 0 for all the dates that were not present in that range till 2022. — va0pjvbnd, Nov 29 '21 at 15:22

How to resample a df with multiple columns

1 Answers1