0

I have a df with a column with dates (data) and column with average daily change of price (DailyChange) and I need to group this df by year and for each year count the geometric mean from column DailyChange.

This is the part of the code :

df = pd.read_csv("data_with_changes.csv")
df["data"] = pd.to_datetime(df["data"])

df_result = df.groupby(pd.Grouper(key="data", freq="Y"))["DailyChange"].apply(lambda x: stats.gmean(x.dropna() + 100) - 100).reset_index()

df_result.columns = ["Year", "GeometricMean"]
df_result["Year"] = df_result["Year"].dt.year
df_result.to_csv("average_daily_change.csv", index=False)

Don't know where is the problem, if in grouping or in using the stats.gmean function wrong.

Timus
  • 10,974
  • 5
  • 14
  • 28
Jiri
  • 1
  • 1

1 Answers1

0

ou need to subtract 100 from the geometric mean result to convert it back to a percentage change

import pandas as pd
import scipy.stats as stats

df = pd.read_csv("data_with_changes.csv")
df["data"] = pd.to_datetime(df["data"])

df_result = df.groupby(pd.Grouper(key="data", freq="Y"))["DailyChange"].apply(lambda x: stats.gmean(x.dropna()) - 100).reset_index()

df_result.columns = ["Year", "GeometricMean"]

df_result["Year"] = df_result["Year"].dt.year

df_result.to_csv("average_daily_change.csv", index=False)
Shanu
  • 124
  • 4
  • like that its giving me NaN values. With my original function, its giving me values, but also negative ones, which is also wrong. – Jiri Jun 22 '23 at 10:01