I have a pandas dataframe with time col (Week) and a value col (Impressions). I want to find the geometric mean and standard error of the Impressions col, grouped by per week. The caveat over here is, that many Impression values are zero. After looking online I found, a possible way to mitigate this effect is by replacing zeros by 1s and then subtracting 1 from the geometric mean value
def gmean(data):
gmean_series = np.exp(np.mean(np.log(data))) - 1
return gmean_series
def SE_gmean(data):
gmean_series = np.exp(np.mean(np.log(data))) - 1
n = len(data)
log_std = np.std(np.log(data), ddof=1)
se_geomean = np.exp(log_std / np.sqrt(n)) #- 1
return se_geomean
Can anyone confirm if these are right? After defining these, I apply to the dataframe using an apply function with lambda
df['GM'] = df.groupby(time_col)[value_col].transform(gmean)`
df['Std error GM'] = df.groupby(time_col)[value_col].transform(SE_gmean)