I would like to conduct a simple t-test in python, but I would like to compare all possible groups to each other. Let's say I have the following data:
import pandas as pd
data = {'Category': ['cat3','cat2','cat1','cat2','cat1','cat2','cat1','cat2','cat1','cat1','cat1','cat2','cat3','cat3'],
'values': [4,1,2,3,1,2,3,1,2,3,5,1,6,3]}
my_data = pd.DataFrame(data)
And I want to calculate the p-value based on a t-test for all possible category combinations, which are:
cat1 vs. cat2
cat2 vs. cat3
cat1 vs. cat3
I can do this manually via:
from scipy import stats
cat1 = my_data.loc[my_data['Category'] == 'cat1', 'values']
cat2 = my_data.loc[my_data['Category'] == 'cat2', 'values']
cat3 = my_data.loc[my_data['Category'] == 'cat3', 'values']
print(stats.ttest_ind(cat1,cat2).pvalue)
print(stats.ttest_ind(cat2,cat3).pvalue)
print(stats.ttest_ind(cat1,cat3).pvalue)
But is there a more simple and straightforward way to do this? The amount of categories might differ from case to case, so the number of t-tests that need to be calculated will also differ...
The final output should be a DataFrame with one row for each comparison and the values: category1 | category2 | p-value, in this case it should look like:
cat1 | cat2 | 0.16970867501294376
cat2 | cat3 | 0.0170622126550303
cat1 | cat3 | 0.13951958313684434