0

Need a quick way to apply a t-test to multiple groups and multiple variables. Let's assume I have a table like this:

df = pd.DataFrame({'group': 'a a b b'.split(), 'B': [1,2,3,4], 'C': [4,6, 5,10]})
print(df)

The group column has a control and variant. a = control, b = variants

Column B is a metric. Column C is another metric and I have many more metrics. I need to loop through N columns.

I want to group by my 'group' column so I'm always comparing my control to one of the variants in column B and apply the ttest_ind function.

Is there a solution with for loops or .apply() ? Ideally, I'd just like to do something like:

df.groupby('group').apply(ttest_ind(control, n columns)
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Alx
  • 1
  • 2

1 Answers1

2

You can do

from scipy import stats
df.groupby('group').apply(lambda x : stats.ttest_ind(x['B'],x['C']))
Out[96]: 
group
a    (-3.1304951684997055, 0.08867762313423291)
b     (-1.5689290811054724, 0.2572186472917924)
dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Sorry, I wasn't clear. Column B is a metric and column C is another metric. The reason why I wanted to group by column 'Group' was because it contains a control which is 'A', variants are 'B'. Also, my real data set has more than just C columns. I need to loop through all the n columns. – Alx Feb 19 '22 at 06:15