3

I have a dataset given below:

a,b,c
1,1,1
1,1,1
1,1,2
2,1,2
2,1,1
2,2,1

I created crosstab with pandas:

 cross_tab = pd.crosstab(index=a, columns=[b, c], rownames=['a'], colnames=['b', 'c'])

my crosstab is given as an output:

b        1     2
c        1  2  1
a        
1        2  1  0
2        1  1  1

I want to iterate over this crosstab for given each a,b and c values. How can I get values such as cross_tab[a=1][b=1, c=1]? Thank you.

user3104352
  • 1,100
  • 1
  • 16
  • 34

3 Answers3

2

You can use slicers:

a,b,c = 1,1,1
idx = pd.IndexSlice
print (cross_tab.loc[a, idx[b,c]])
2

You can also reshape df by DataFrame.unstack, reorder_levels and then use loc:

a = cross_tab.unstack().reorder_levels(('a','b','c'))
print (a)
a  b  c
1  1  1    2
2  1  1    1
1  1  2    1
2  1  2    1
1  2  1    0
2  2  1    1
dtype: int64

print (a.loc[1,1,1])
2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for your answer. I have another problem now. It returns error if I give print(cross_tab2.loc[2,2,2]). How can I get values of a,b and c in a crosstab? – user3104352 Aug 06 '17 at 18:00
  • Do you want loop by `a` ? Or if `print(cross_tab2.loc[2,2,2])` what is desired output? – jezrael Aug 06 '17 at 18:03
  • I actually I want to loop over all possible combinations for a,b,c. Or I can loop for all values of a,b,c that exists in crosstab. – user3104352 Aug 06 '17 at 18:04
  • 1
    Is possible use `for (a,b,c), x in a.iteritems(): print (a,b,c) print (x)` ? – jezrael Aug 06 '17 at 18:07
2

You are looking for df2.xxx.get_level_values:

In [777]: cross_tab.loc[cross_tab.index.get_level_values('a') == 1,\
                        (cross_tab.columns.get_level_values('b') == 1)\
                      & (cross_tab.columns.get_level_values('c') == 1)]
Out[777]: 
b  1
c  1
a   
1  2
cs95
  • 379,657
  • 97
  • 704
  • 746
0

Another way to consider, albeit at loss of a little bit of readability, might be to simply use the .loc to navigate the hierarchical index generated by pandas.crosstab. Following example illustrates it:

import pandas as pd
import numpy as np

np.random.seed(1234)

df = pd.DataFrame(
    {
        "a": np.random.choice([1, 2], 5, replace=True),
        "b": np.random.choice([11, 12, 13], 5, replace=True),
        "c": np.random.choice([21, 22, 23], 5, replace=True),
    }
)
df

Output

    a   b   c
0   2   11  23
1   2   11  23
2   1   12  23
3   2   12  21
4   1   12  21

crosstab output is:

cross_tab = pd.crosstab(
    index=df.a, columns=[df.b, df.c], rownames=["a"], colnames=["b", "c"]
)
cross_tab

b   11  12
c   23  21  23
a           
1   0   1   1
2   2   1   0

Now let's say you want to access value when a==2, b==11 and c==23, then simply do

cross_tab.loc[2].loc[11].loc[23]

2

Why does this work? .loc allows one to select by index labels. In the dataframe output by crosstab, our erstwhile column values now become index labels. Thus, with every .loc selection we do, it gives the slice of the dataframe corresponding to that index label. Let's navigate cross_tab.loc[2].loc[11].loc[23] step by step:

cross_tab.loc[2]

yields:

b   c 
11  23    2
12  21    1
    23    0
Name: 2, dtype: int64

Next one:

cross_tab.loc[2].loc[11]

Yields:

c
23    2
Name: 2, dtype: int64

And finally we have

cross_tab.loc[2].loc[11].loc[23]

which yields:

2

Why do I say that this reduces the readability a bit? Because to understand this selection you have to be aware of how the crosstab was created, i.e. rows are a and columns were in the order [b, c]. You have to know that to be able to interpret what cross_tab.loc[2].loc[11].loc[23] would do. But I have found that often to be a good tradeoff.

Alok Lal
  • 273
  • 1
  • 2
  • 11