I am very aware of this link: Is there a pythonic way to do a contingency table in Pandas? Where two columns are converted to a 2x2 contingency and then fischer exact test can be done easily. What i am struggling with is how can i take the above mentioned solution and apply it to multiple columns. My data looks like this:
Samples A B C D E F G H I J K L M N 0 P
AA1 1 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0
AA2 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0
AA3 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0
AA4 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
AA5 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
AA6 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
AA7 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
AA8 1 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0
AA9 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1
AA10 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0 1
AA11 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1
AA12 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1
AA13 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1
AA14 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1
AA15 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1
AA16 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1
AA17 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1
AA18 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1
AA19 1 0 0 1 1 1 0 1 1 0 0 1 1 1 0 1
AA20 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
AA21 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1
AA22 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1
AA23 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
and i would like an output like the following (note the pvalues below are fake and donot represent the above data):
A B C D E F G H I J K L M N O P
A
B 0.05
C 0.9 0.9
D 0.4 0.8 0.8
E 0.002 0.001 0.8 0.02
F 0.12 0.67 0.001 0.8 0.6
G 0.9 0.9 0.8 0.01 0.8 0.8
H 0.1 0.22 0.8 0.8 0.8 0.33 0.01
I 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
J 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
K 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.8 0.9 0.9
L 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.8 0.02 0.02
M 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.67 0.001 0.8 0.6 0.8
N 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.8 0.01 0.8 0.8 0.8
O 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.22 0.8 0.8 0.8 0.33 0.01 0.01
P 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.22 0.8 0.8 0.8 0.33 0.01 0.01 0.01
The code so far is:
import pandas as pd
from scipy.stats import fisher_exact
df = pd.read_table("......")
df.set_index("Samples", inplace=True)
print(df.head())
print(pd.crosstab(df.A, df.B))
tab = pd.crosstab(df.A, df.B)
print(fisher_exact(tab))
This gives me a 2x2 and the p-val but i dont know how to iterate through the columns. Like A vs B, A vs C, A vs D...etc...etc.. and put the p-vals in the output format.
Any guidance is highly appreciated!