How to group dataframe and calculate geomean?

Question

I have a df with a column with dates (data) and column with average daily change of price (DailyChange) and I need to group this df by year and for each year count the geometric mean from column DailyChange.

This is the part of the code :

df = pd.read_csv("data_with_changes.csv")
df["data"] = pd.to_datetime(df["data"])

df_result = df.groupby(pd.Grouper(key="data", freq="Y"))["DailyChange"].apply(lambda x: stats.gmean(x.dropna() + 100) - 100).reset_index()

df_result.columns = ["Year", "GeometricMean"]
df_result["Year"] = df_result["Year"].dt.year
df_result.to_csv("average_daily_change.csv", index=False)

Don't know where is the problem, if in grouping or in using the stats.gmean function wrong.

Please format your posts properly. – Timus Jun 22 '23 at 12:14 — Timus, Jun 22 '23 at 12:14

score 0 · Answer 1 · answered Jun 22 '23 at 09:25

ou need to subtract 100 from the geometric mean result to convert it back to a percentage change

import pandas as pd
import scipy.stats as stats

df = pd.read_csv("data_with_changes.csv")
df["data"] = pd.to_datetime(df["data"])

df_result = df.groupby(pd.Grouper(key="data", freq="Y"))["DailyChange"].apply(lambda x: stats.gmean(x.dropna()) - 100).reset_index()

df_result.columns = ["Year", "GeometricMean"]

df_result["Year"] = df_result["Year"].dt.year

df_result.to_csv("average_daily_change.csv", index=False)

like that its giving me NaN values. With my original function, its giving me values, but also negative ones, which is also wrong. — Jiri, Jun 22 '23 at 10:01

How to group dataframe and calculate geomean?

1 Answers1