I have the code below, where I'm trying to apply a one sample proportion ztest to values in each row in my data. I have example data below from my dataframe df. I'm trying to compare each proportion in value to the proportion gotten from the value in count and number of trials from the value in obs. I want a p value for each record. Instead I seem to be getting one p value for all records. I have a few rows of desired output below to illustrate what I mean. Can someone please point out what I'm doing wrong, and how to fix it? Or suggest a slicker way to do it? It really seems like there should be a way to do this with pandas.
# code:
def pvl(x):
return sm.stats.proportions_ztest(x['count'],
x['value'],
x['obs'],
alternative='larger')[1]
df['pval']=df.apply(pvl,
axis=1
)
# sample data:
print(df)
count value obs
211.0 0.013354 15800.0
18.0 0.001139 15800.0
310.0 0.019620 15800.0
114.0 0.007215 15800.0
85.0 0.005380 15800.0
# sample output:
count value obs pval
211.0 0.013354 15800.0 0.5
18.0 0.001139 15800.0 0.5
310.0 0.019620 15800.0 0.5
114.0 0.007215 15800.0 0.5
85.0 0.005380 15800.0 0.5
# desired output:
count value obs pval
211.0 0.013354 15800.0 0.49
18.0 0.001139 15800.0 4.1454796845134295e-41
310.0 0.019620 15800.0 0.9999999999965842