resample a non-continuous 15-minutes data to an hourly groupped data in pandas

Question

I have a dataframe of 20-years of precipitation for 5 stations. The observations are non-continuously 15minutes collected for 8-hours a day. I want to find the station and the day that the total 8hours (7am to 10am, 13 to 14, and 16pm to 17pm) of precipitation is maximum.

What I want to do, is first resample the data to hourly data for each station and then fins the overlapped 8 hours for each location and then find the maximum value.

my data frame:

time_start	obs_id	station_id	precipition
2000-01-11 07:00:00-05:00	1	st_1	10
2000-01-11 07:30:00-05:00	1	st_1	2
2000-01-11 07:45:00-05:00	1	st_1	1
2000-01-11 09:00:00-05:00	1	st_1	3
2000-01-11 09:15:00-05:00	1	st_1	1
2000-01-11 09:30:00-05:00	1	st_1	0
2000-01-11 09:45:00-05:00	1	st_1	1
2000-01-12 07:00:00-05:00	1	st_1	10
2000-01-12 07:30:00-05:00	2	st_1	2
2000-01-12 07:45:00-05:00	2	st_1	1
2000-01-12 09:00:00-05:00	2	st_1	3
2000-01-12 09:15:00-05:00	2	st_1	1
2000-01-12 09:30:00-05:00	2	st_1	0
2000-01-12 09:45:00-05:00	2	st_1	1
2000-01-11 07:00:00-05:00	1	st_2	10
2000-01-11 07:30:00-05:00	1	st_2	2
2000-01-11 07:45:00-05:00	1	st_2	1
2000-01-11 09:00:00-05:00	1	st_2	3
2000-01-11 09:15:00-05:00	1	st_2	1
2000-01-11 09:30:00-05:00	1	st_2	0
2000-01-11 09:45:00-05:00	1	st_2	1
2000-01-12 07:00:00-05:00	1	st_2	10
2000-01-12 07:30:00-05:00	2	st_2	2
2000-01-12 07:45:00-05:00	2	st_2	1
2000-01-12 09:00:00-05:00	2	st_2	3
2000-01-12 09:15:00-05:00	2	st_2	1
2000-01-12 09:30:00-05:00	2	st_2	0
2000-01-12 09:45:00-05:00	2	st_2	1

I used this code but, it does not work.

df_H = df.resample('H', on='time_start', closed='right').sum().reset_index()

I want to have a table that I have for each location a sorted time with the sum of precipitation

score 1 · Accepted Answer · answered Aug 28 '21 at 18:49

You can group by station_id using .groupby() and then resample using DataFrameGroupBy.resample(), as follows:

df_H = df.groupby('station_id').resample('H', on='time_start', closed='right')['precipition'].sum().reset_index()

Result:

print(df_H)

   station_id                time_start  precipition
0        st_1 2000-01-11 06:00:00-05:00           10
1        st_1 2000-01-11 07:00:00-05:00            3
2        st_1 2000-01-11 08:00:00-05:00            3
3        st_1 2000-01-11 09:00:00-05:00            2
4        st_1 2000-01-11 10:00:00-05:00            0
5        st_1 2000-01-11 11:00:00-05:00            0
6        st_1 2000-01-11 12:00:00-05:00            0
7        st_1 2000-01-11 13:00:00-05:00            0
8        st_1 2000-01-11 14:00:00-05:00            0
9        st_1 2000-01-11 15:00:00-05:00            0
10       st_1 2000-01-11 16:00:00-05:00            0
11       st_1 2000-01-11 17:00:00-05:00            0
12       st_1 2000-01-11 18:00:00-05:00            0
13       st_1 2000-01-11 19:00:00-05:00            0
14       st_1 2000-01-11 20:00:00-05:00            0
15       st_1 2000-01-11 21:00:00-05:00            0
16       st_1 2000-01-11 22:00:00-05:00            0
17       st_1 2000-01-11 23:00:00-05:00            0
18       st_1 2000-01-12 00:00:00-05:00            0
19       st_1 2000-01-12 01:00:00-05:00            0
20       st_1 2000-01-12 02:00:00-05:00            0
21       st_1 2000-01-12 03:00:00-05:00            0
22       st_1 2000-01-12 04:00:00-05:00            0
23       st_1 2000-01-12 05:00:00-05:00            0
24       st_1 2000-01-12 06:00:00-05:00           10
25       st_1 2000-01-12 07:00:00-05:00            3
26       st_1 2000-01-12 08:00:00-05:00            3
27       st_1 2000-01-12 09:00:00-05:00            2
28       st_2 2000-01-11 06:00:00-05:00           10
29       st_2 2000-01-11 07:00:00-05:00            3
30       st_2 2000-01-11 08:00:00-05:00            3
31       st_2 2000-01-11 09:00:00-05:00            2
32       st_2 2000-01-11 10:00:00-05:00            0
33       st_2 2000-01-11 11:00:00-05:00            0
34       st_2 2000-01-11 12:00:00-05:00            0
35       st_2 2000-01-11 13:00:00-05:00            0
36       st_2 2000-01-11 14:00:00-05:00            0
37       st_2 2000-01-11 15:00:00-05:00            0
38       st_2 2000-01-11 16:00:00-05:00            0
39       st_2 2000-01-11 17:00:00-05:00            0
40       st_2 2000-01-11 18:00:00-05:00            0
41       st_2 2000-01-11 19:00:00-05:00            0
42       st_2 2000-01-11 20:00:00-05:00            0
43       st_2 2000-01-11 21:00:00-05:00            0
44       st_2 2000-01-11 22:00:00-05:00            0
45       st_2 2000-01-11 23:00:00-05:00            0
46       st_2 2000-01-12 00:00:00-05:00            0
47       st_2 2000-01-12 01:00:00-05:00            0
48       st_2 2000-01-12 02:00:00-05:00            0
49       st_2 2000-01-12 03:00:00-05:00            0
50       st_2 2000-01-12 04:00:00-05:00            0
51       st_2 2000-01-12 05:00:00-05:00            0
52       st_2 2000-01-12 06:00:00-05:00           10
53       st_2 2000-01-12 07:00:00-05:00            3
54       st_2 2000-01-12 08:00:00-05:00            3
55       st_2 2000-01-12 09:00:00-05:00            2

resample a non-continuous 15-minutes data to an hourly groupped data in pandas

1 Answers1