I'm having trouble with Pandas groupby functionality and Time Series. I've read the documentation, but I can't see to figure out how to apply aggregate functions to multiple columns and calculate the mean of the volume (average) of the „aggregate “ correctly.
This is my code for import the CSV File:
#CSV Import
import pandas as pd
path = r'Z:\Python\30_Min_Data.txt'
from datetime import datetime
customdateparse = lambda x: datetime.strptime(x, '%Y/%m/%d %H:%M:%S.%f')
df = pd.read_csv(
path,
parse_dates={'DateTime': [0, 1]},
date_parser=customdateparse)
# Set the Date as the Index --> needed for Resampling
df.set_index('DateTime', inplace=True)
df.sort_index()
This is my DataFrame after the Import:
df
Out[3]:
Volume Session
DateTime
2020-12-16 08:00:00 1000 PRTH
2020-12-16 08:30:00 5000 PRTH
2020-12-16 09:00:00 1000 RTH
2020-12-16 09:30:00 3000 RTH
2020-12-17 08:00:00 2000 PRTH
2020-12-17 08:30:00 2000 PRTH
2020-12-17 09:00:00 2000 RTH
2020-12-17 09:30:00 2000 RTH
2020-12-18 08:00:00 1000 PRTH
2020-12-18 08:30:00 1000 PRTH
2020-12-18 09:00:00 1000 RTH
2020-12-18 09:30:00 1000 RTH
2019-11-18 08:00:00 1000 PRTH
2019-11-18 08:30:00 1000 PRTH
2019-11-18 09:00:00 1000 RTH
2019-11-18 09:30:00 1000 RTH
This is what I tried: It calculates the averages per day, because of the Time Series Resampling. I would expect that it sums up the values first and at the end calculates the means. But it does the mean on the whole bunch of data per day.
#2.Volume: Average per Year & Session & Day
funcs_year = lambda idx: idx.year
(df
.groupby([funcs_year,'Session', pd.Grouper(freq='D')])
['Volume']
.mean()
)
Out[6]:
Session DateTime
2019 PRTH 2019-11-18 1000
RTH 2019-11-18 1000
2020 PRTH 2020-12-16 3000
2020-12-17 2000
2020-12-18 1000
RTH 2020-12-16 2000
2020-12-17 2000
2020-12-18 1000
Name: Volume, dtype: int64
This how I want the result to be correctly calculated and displayed (i calculated it manually): Average (mean) volume traded per day (shown separately for the year and the session):
Year Session Mean Volume
2020 RTH 3.333,33
PRTH 4.000,00
2019 RTH 2.000,00
PRTH 2.000,00
Anyone know what I'm missing / doing wrong?