Is there a pythonic way to do a contingency table in Pandas?

Question

Given a dataframe that looks like this:

            A   B      
2005-09-06  5  -2  
2005-09-07 -1   3  
2005-09-08  4   5 
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9 
2005-09-12  2   8  
2005-09-13  6  -5  
2005-09-14  6  -5

Is there a pythonic way to create a 2x2 matrix like this:

    1  0
 1  a  b
 0  c  d

Where:

a = number of obs where the corresponding elements of column A and B are both positive.

b = number of obs where the corresponding elements of column A are positive and negative in column B.

c = number of obs where the corresponding elements of column A are negative and positive in column B.

d = number of obs where the corresponding elements of column A and B are both negative.

For this example the output would be:

    1  0
 1  2  3
 0  3  1

Thanks

Tyr Wiesner-Hanks · Answer 1 · 2016-06-19T17:17:11.930

Probably easiest to just use the pandas function crosstab. Borrowing from Dyno Fu above:

import pandas as pd
from StringIO import StringIO
table = """dt          A   B
2005-09-06  5  -2
2005-09-07 -1   3
2005-09-08  4   5
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9
2005-09-12  2   8
2005-09-13  6  -5
2005-09-14  6  -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)

pd.crosstab(df.A > 0, df.B > 0)

Output:

B      False  True 
A                  
False      1      3
True       3      2

[2 rows x 2 columns]

Also the table is usable if you want to do a Fisher exact test with scipy.stats etc:

from scipy.stats import fisher_exact
tab = pd.crosstab(df.A > 0, df.B > 0)
fisher_exact(tab)

Thanks! I appreciate your efforts. – vy32 Jun 20 '16 at 12:40 — vy32, Jun 20 '16 at 12:40

score 22 · Accepted Answer · answered Apr 27 '15 at 16:51

22

Let us call your dataframe data. Try

a = data['A']>0
b = data['B']>0
data.groupby([a,b]).count()

answered Apr 27 '15 at 16:51

lanenok

2,699
17
24

4

Whoever downvoted this answer: please leave a comment - why. – lanenok Aug 19 '16 at 14:41
This answer creates a dataframe with two indices, and not the desired contingency table format where those two indices are in X and Y axis respectively – StatsNoob Dec 19 '20 at 19:08

Joseph Whiting · Answer 3 · 2021-03-29T09:37:46.047

10

Here's a really useful page about the pandas crosstab function:

https://chrisalbon.com/python/data_wrangling/pandas_crosstabs/

So I think for what you'd like to do you should use

import pandas as pd
pd.crosstab(data['A']>0, data['B']>0)

Hope that helps!

edited Mar 29 '21 at 09:37

answered Aug 23 '16 at 08:53

Joseph Whiting

548
6
12

updated link (I think) https://chrisalbon.com/python/data_wrangling/pandas_crosstabs/ – user2682863 Aug 13 '20 at 14:43
@user2682863 Both links are dead :( – jtlz2 Feb 09 '23 at 20:34

score 6 · Answer 4 · answered Apr 27 '15 at 17:16

import pandas as pd
from StringIO import StringIO

table = """dt          A   B
2005-09-06  5  -2
2005-09-07 -1   3
2005-09-08  4   5
2005-09-09 -8   2
2005-09-10 -2  -5
2005-09-11 -7   9
2005-09-12  2   8
2005-09-13  6  -5
2005-09-14  6  -5
"""
sio = StringIO(table)
df = pd.read_table(sio, sep=r"\s+", parse_dates=['dt'])
df.set_index("dt", inplace=True)

a = df['A'] > 0
b = df['B'] > 0
df1 = df.groupby([a,b]).count()
print df1["A"].unstack()

output:

B      False  True
A
False      1      3
True       3      2

this is just lnanenok's answer and using unstack() to make it more readable. credit should go to lanenok.

Is there a pythonic way to do a contingency table in Pandas?

4 Answers4

Linked