Standard error of geometric mean of values in Pandas after groupby

Asked Apr 03 '23 at 07:41

Active Apr 03 '23 at 07:41

Viewed 36 times

I have a pandas dataframe with time col (Week) and a value col (Impressions). I want to find the geometric mean and standard error of the Impressions col, grouped by per week. The caveat over here is, that many Impression values are zero. After looking online I found, a possible way to mitigate this effect is by replacing zeros by 1s and then subtracting 1 from the geometric mean value

def gmean(data):
    gmean_series = np.exp(np.mean(np.log(data))) - 1
    return gmean_series

def SE_gmean(data):
    gmean_series = np.exp(np.mean(np.log(data))) - 1
    n = len(data)
    log_std = np.std(np.log(data), ddof=1)
    se_geomean = np.exp(log_std / np.sqrt(n)) #- 1
    return se_geomean

Can anyone confirm if these are right? After defining these, I apply to the dataframe using an apply function with lambda

df['GM'] = df.groupby(time_col)[value_col].transform(gmean)`      
df['Std error GM'] = df.groupby(time_col)[value_col].transform(SE_gmean)

asked Apr 03 '23 at 07:41

pandi20

Standard error of geometric mean of values in Pandas after groupby

0 Answers0