0

I have a pandas dataframe with time col (Week) and a value col (Impressions). I want to find the geometric mean and standard error of the Impressions col, grouped by per week. The caveat over here is, that many Impression values are zero. After looking online I found, a possible way to mitigate this effect is by replacing zeros by 1s and then subtracting 1 from the geometric mean value

def gmean(data):
    gmean_series = np.exp(np.mean(np.log(data))) - 1
    return gmean_series

def SE_gmean(data):
    gmean_series = np.exp(np.mean(np.log(data))) - 1
    n = len(data)
    log_std = np.std(np.log(data), ddof=1)
    se_geomean = np.exp(log_std / np.sqrt(n)) #- 1
    return se_geomean

Can anyone confirm if these are right? After defining these, I apply to the dataframe using an apply function with lambda

df['GM'] = df.groupby(time_col)[value_col].transform(gmean)`      
df['Std error GM'] = df.groupby(time_col)[value_col].transform(SE_gmean)
pandi20
  • 11
  • 1
  • 5

0 Answers0