2

I'm trying to apply a z-test to every row of a pandas dataframe like so:

from statsmodels.stats.proportion import proportions_ztest    

def apply_ztest(n1, n2, x1, x2):
      return proportions_ztest(
        count=[n1 , n2], 
        nobs=[x1, x2], 
        alternative='larger'
      )[1]

df['p_val'] = df.apply(lambda x: apply_ztest(df['n1'], df['n2'], df['obs1'], df['obs2']))

But I'm getting this error raised:

NotImplementedError: more than two samples are not implemented yet

I feel like either there's a different way to do this or I'm messing something up. Can anyone tell me how to do/fix this?

goose
  • 2,502
  • 6
  • 42
  • 69

1 Answers1

2

You're passing in columns to your function by using df['n1'] when you should be using x['n1']. You also need to specify axis=1 to apply to rows instead of columns

I think you might have the count and n_obs arguments backwards to the proportions_ztest but I'm not sure. Here's a small working example

from statsmodels.stats.proportion import proportions_ztest
import pandas as pd
import numpy as np

def apply_ztest(c1, c2, n1, n2):
    return proportions_ztest(
        count=[c1 , c2], 
        nobs=[n1, n2], 
        alternative='larger'
      )[1]

#create fake data
np.random.seed(1)
df = pd.DataFrame({
    'c1':np.random.randint(1,20,10),
    'c2':np.random.randint(1,50,10),
})
df['n1'] = df['c1']+np.random.randint(1,20,10)
df['n2'] = df['c2']+np.random.randint(1,50,10)


df['p_val'] = df.apply(lambda my_row: apply_ztest(my_row['c1'], my_row['c2'], my_row['n1'], my_row['n2']), axis=1)
print(df)

Output

enter image description here

mitoRibo
  • 4,468
  • 1
  • 13
  • 22
  • Yes, correct on all counts. Got it working now thanks to the above. I can see 'row' must be a keyword with special meaning now, which I hadn't realised. I'll check out the docs. – goose May 17 '22 at 07:12
  • `row` is not a special keyword! you can use any name you want as long as its defined in the lambda and then used on the right side – mitoRibo May 17 '22 at 17:16