1

I have a data frame like so

Date_Time Level
2017-08-08 23:55:01 239.0
2017-08-08 23:50:01 242.0
2017-08-08 23:45:01 246.0
2017-08-08 23:40:01 250.0
2017-08-08 23:35:01 254.0
... ...
2017-07-26 00:23:57 72.0
2017-07-26 00:18:57 67.0
2017-07-26 00:13:57 64.0
2017-07-26 00:08:57 64.0
2017-07-26 00:03:57 65.0

I want to calculate the average level on every day, during the waking hours and overnight hours

Date Time AvgLevel
2017-08-08 00:00:00 - 06:00:00 178
2017-08-08 06:00:01 - 23:59:99 190
2017-09-08 00:00:00 - 06:00:00 174
2017-09-08 06:00:01 - 23:59:99 200

I've already tried splitting into separate tables and using for loops however that uses too much memory and takes too much time

Shawn Mian
  • 13
  • 2

3 Answers3

1

You can use np.where to differentiate between waking hours and overnight hours

Creating sample data

data = {
    'Date_Time': [
        '2017-08-08 00:00:00', '2017-08-08 23:50:01', '2017-08-08 06:45:01',
        '2017-08-08 06:00:00', '2017-08-08 00:35:01',
        '2017-07-26 00:23:57', '2017-07-26 00:18:57', '2017-07-26 07:13:57',
        '2017-07-26 00:08:57', '2017-07-26 07:03:57'
    ],
    'Level': [239.0, 242.0, 246.0, 250.0, 254.0, 72.0, 67.0, 64.0, 64.0, 65.0]
}

df = pd.DataFrame(data, columns=['Date_Time', 'Level'])
df['Date_Time'] = pd.to_datetime(df['Date_Time'])
df = df.set_index('Date_Time')
print(df)

                     Level
Date_Time                 
2017-08-08 00:00:00  239.0
2017-08-08 23:50:01  242.0
2017-08-08 06:45:01  246.0
2017-08-08 06:00:00  250.0
2017-08-08 00:35:01  254.0
2017-07-26 00:23:57   72.0
2017-07-26 00:18:57   67.0
2017-07-26 07:13:57   64.0
2017-07-26 00:08:57   64.0
2017-07-26 07:03:57   65.0

Creating a mask of waking hours and overnight hours

mask = (df.index.time >= pd.to_datetime('00:00:00').time()) & (df.index.time <= pd.to_datetime('06:00:00').time())
df['Period'] = np.where(mask, '00:00:00 - 06:00:00', '06:00:01 - 23:59:59')
df

                     Level               Period
Date_Time                                      
2017-08-08 00:00:00  239.0  00:00:00 - 06:00:00
2017-08-08 23:50:01  242.0  06:00:01 - 23:59:59
2017-08-08 06:45:01  246.0  06:00:01 - 23:59:59
2017-08-08 06:00:00  250.0  00:00:00 - 06:00:00
2017-08-08 00:35:01  254.0  00:00:00 - 06:00:00
2017-07-26 00:23:57   72.0  00:00:00 - 06:00:00
2017-07-26 00:18:57   67.0  00:00:00 - 06:00:00
2017-07-26 07:13:57   64.0  06:00:01 - 23:59:59
2017-07-26 00:08:57   64.0  00:00:00 - 06:00:00
2017-07-26 07:03:57   65.0  06:00:01 - 23:59:59

Groupby the Date_Time and Period column and calculate average Level

result = df.groupby([df.index.date, 'Period'])['Level'].mean().reset_index()
result.columns = ['Date', 'Time', 'AvgLevel']
result

         Date                 Time    AvgLevel
0  2017-07-26  00:00:00 - 06:00:00   67.666667
1  2017-07-26  06:00:01 - 23:59:59   64.500000
2  2017-08-08  00:00:00 - 06:00:00  247.666667
3  2017-08-08  06:00:01 - 23:59:59  244.000000
Ashyam
  • 666
  • 6
  • 13
1

You can do the following:

import pandas as pd

df = pd.read_csv("data.csv", sep=";")
print(df)
df["Date_Time"] = pd.to_datetime(df["Date_Time"])

df["Date"] = df["Date_Time"].dt.date
df["Time"] = df["Date_Time"].dt.time

df["Time_Period"] = "Overnight"
df.loc[(df["Time"] >= pd.to_datetime("06:00:00").time()) & (df["Time"] <= pd.to_datetime("23:59:59").time()), "Time_Period"] = "Waking"

grouped = df.groupby(["Date", "Time_Period"])["Level"].mean().reset_index()

grouped = grouped.rename(columns={"Date": "Date", "Time_Period": "Time", "Level": "AvgLevel"})

grouped["Time"] = grouped["Time"].map({
    "Waking": "06:00:01 - 23:59:99",
    "Overnight": "00:00:00 - 06:00:00"
})

print(grouped)

Basically, you group entries by times that are nightly and daily:

This results in (I assume here that your expected outcome you print is for the entire dataframe):

         Date                 Time  AvgLevel
0  2017-07-26  00:00:00 - 06:00:00      66.4
1  2017-08-08  06:00:01 - 23:59:99     246.2
0

by using Pandas freq option, mean, sum etc. can be calculated for an equal portion of time i.e. freq='H' for hourly calculation, freq='12H' for 12 hourly calculation, freq='D' for daily calculation and freq='BH' for business hoursly calculations. Example is below:

avg_12_hours = df.groupby(pd.Grouper(freq='12H', key='Date_Time'))['Level'].mean()

Since, you are asking for a calculation period which is not equally splitted so, you need to do some custom calculations