Here's my pandas dataframe with my data:
c0 c1 c10 c11 c12 c13 c14 c15 c16 c3 c4 c5 c6 c7 c8 c9
index
0 1 49 2.0 0 2 2 0 1 6797.761892 130 269.0 0 1 163 0 0.0
1 0 61 0.0 1 2 2 1 3 4307.686943 138 166.0 0 0 125 1 3.6
2 0 46 0.0 2 3 2 0 1 4118.077502 140 311.0 0 1 120 1 1.8
3 0 69 1.0 3 3 2 1 0 7170.849469 140 254.0 0 0 146 0 2.0
4 0 51 1.0 0 2 2 1 0 5579.040145 100 222.0 0 1 143 1 1.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
283 0 54 0.0 1 2 2 2 0 6293.123474 125 273.0 0 0 152 0 0.5
284 0 42 0.0 0 3 2 0 1 3303.841931 120 240.0 1 1 194 0 0.8
285 1 67 0.0 2 2 2 1 0 3383.029119 106 223.0 0 1 142 0 0.3
286 0 67 1.0 2 3 2 0 2 768.900795 125 254.0 1 1 163 0 0.2
287 0 60 0.0 1 3 2 0 0 1508.832825 130 253.0 0 1 144 1 1.4
288 rows × 16 columns
I've used statsmodels to obtain the p_value:
log = sm.Logit(df['c0'], df.loc[:, df.columns != 'c0']).fit()
d1 = pd.DataFrame(index=log.pvalues.index, data=log.pvalues, columns=['statsmodels_pvalue'])
And then I've used scipy module also. The personr function returns the correlation and pvalue, I'm appending the return [1] as you can see.
index = []
output = []
for i in df.columns[1:]:
index.append(i)
output.append(pearsonr(df['c0'], df[i]) [1])
d2 = pd.DataFrame(index=index, data=output, columns=['pearson_pvalue'])
pd.concat([d1,d2], axis=1)
Results:
statsmodels_pvalue pearson_pvalue
c1 0.155704 0.105977
c10 0.449688 0.697069
c11 0.041694 0.038457
c12 0.000269 0.000510
c13 0.012123 0.046765
c14 0.000114 0.000087
c15 0.587200 0.843444
c16 0.301656 0.025142
c3 0.434319 0.330075
c4 0.000163 0.000014
c5 0.792058 0.613432
c6 0.340877 0.454607
c7 0.843758 0.562002
c8 0.365109 0.030531
c9 0.238975 0.070500