apply proportion z-test to each record in dataframe

Question

I have the code below, where I'm trying to apply a one sample proportion ztest to values in each row in my data. I have example data below from my dataframe df. I'm trying to compare each proportion in value to the proportion gotten from the value in count and number of trials from the value in obs. I want a p value for each record. Instead I seem to be getting one p value for all records. I have a few rows of desired output below to illustrate what I mean. Can someone please point out what I'm doing wrong, and how to fix it? Or suggest a slicker way to do it? It really seems like there should be a way to do this with pandas.

# code:

def pvl(x):
    return sm.stats.proportions_ztest(x['count'], 
                              x['value'],
                              x['obs'], 
                              alternative='larger')[1]



df['pval']=df.apply(pvl,
                    axis=1
      )



# sample data:

print(df)

count   value     obs                         
211.0  0.013354  15800.0
18.0   0.001139  15800.0
310.0  0.019620  15800.0
114.0  0.007215  15800.0
 85.0  0.005380  15800.0


# sample output:

count   value     obs     pval                      
211.0  0.013354  15800.0  0.5
18.0   0.001139  15800.0  0.5
310.0  0.019620  15800.0  0.5
114.0  0.007215  15800.0  0.5
 85.0  0.005380  15800.0  0.5


# desired output:

count   value     obs     pval                      
211.0  0.013354  15800.0  0.49
18.0   0.001139  15800.0  4.1454796845134295e-41
310.0  0.019620  15800.0  0.9999999999965842

score 1 · Accepted Answer · answered Sep 28 '19 at 17:04

There is a mistake in your pvl function. The proportion_ztest() function from stats model takes the inputs in the following order: count, nobs, value. Therefore, you should define your function as:

def pvl(x):
    return sm.stats.proportions_ztest(x['count'], x['obs'], 
                          x['value'], alternative='larger')[1]

However, with your data I'm getting results very close to 0.5 and not those you listed as desired output. I'm wondering how did you get the second and third results because they seem wrong to me (unless I misunderstood your question).

Thanks, yeah I noticed that I'd run the wrong numbers in my example. It actually turns out the p values were all close to 0.5. It was rounding. Also I was using the wrong field for the value to compare to. So the code is right, I was feeding it the wrong numbers, thanks. — user3476463, Sep 28 '19 at 19:38

apply proportion z-test to each record in dataframe

1 Answers1