1

I have a dataframe that consists of 15 min data of each day for an entire month. Where DateTime column is a timestamp data type and the dataframe looks like this:

DateTime, Temperature, Moisture, Accumulated precipitation

1/01/2022 00:00, 23, 50, 2,
1/01/2022 00:15, 22, 45, 1,
1/01/2022 00:30, 20, 39, 0,
1/01/2022 01:00, 25, 34, 0,
1/01/2022 01:15, 23, 50, 0,
.
.
.
31/01/2022 00:00, 23, 45, 0,
31/01/2022 00:15, 22, 45, 1,
31/01/2022 00:30, 20, 39, 0,
31/01/2022 01:00, 25, 34, 0,

Therefore, how could I resample the dataframe in order to get an average value of each 15 min data for the entire month?. The expected result for December should be:

DateTime Temperature
00.00 18.4
00:15 15.4
00:30 16.5
.
.
.
23:45 15.0
Maria Fernanda
  • 143
  • 2
  • 8

1 Answers1

1

You can create a new column called 'time' by separating the time from the 'DateTime' column, then perform a mean groupby on the 'time' column. For example:

import numpy as np
import pandas as pd

np.random.seed(42)
df = pd.DataFrame({
    'DateTime':pd.date_range('1/1/2000', periods=4*24*30, freq='15min'),
    'Temperature':np.random.uniform(low=0.0, high=100.0, size=4*24*30),
    'Moisture':np.random.uniform(low=0.0, high=100.0, size=4*24*30)
})
df['Time'] = df['DateTime'].dt.time
result = df.groupby('Time').mean().reset_index() 

Output:

>>> result
        Time  Temperature   Moisture
0   00:00:00    53.906354  49.702393
1   00:15:00    58.617754  46.533957
2   00:30:00    42.697347  47.283152
3   00:45:00    48.215645  46.413589
4   01:00:00    51.422462  51.100578
..       ...          ...        ...
91  22:45:00    42.881353  43.369940
92  23:00:00    48.179361  54.289938
93  23:15:00    46.109330  48.163814
94  23:30:00    53.840951  62.281667
95  23:45:00    58.671288  43.572069
Derek O
  • 16,770
  • 4
  • 24
  • 43
  • 1
    That's a good idea! Thank you, I have another doubt. For some days in DateTime column is in this format `YYYY-mm-dd 23:29:59.999000` but in other ones is in this format `YYYY-mm-dd 23:30:00` (correct one). I'm importing the file just with this line of code: `data = pd.read_excel(filename, usecols ='A:C', skiprows=1)` – Maria Fernanda Jan 31 '22 at 01:13
  • 1
    You could first round your DateTime column to the nearest 15 minutes with the following: `df['DateTime'] = df['DateTime'].dt.round("15min")` – Derek O Jan 31 '22 at 01:20
  • 1
    Sure! Something else, I would like to do is: `data = result.set_index(pd.DatetimeIndex(result['Time'])).between_time('17:00','23:00')`. However, I get the following error: ` is not convertible to datetime`. I know the reason is due to it only applies to Datetimeindex, not timestap, but do you have any ideas to fix this? – Maria Fernanda Jan 31 '22 at 03:18
  • 1
    @MariaFernanda if you want to check for a time being between certain hours, I think you could try something like `import datetime` and then `(result['Time'] < datetime.time(23,0)) & (result['Time'] > datetime.time(17,0))` – Derek O Jan 31 '22 at 03:47