Another way to consider, albeit at loss of a little bit of readability, might be to simply use the .loc
to navigate the hierarchical index generated by pandas.crosstab
. Following example illustrates it:
import pandas as pd
import numpy as np
np.random.seed(1234)
df = pd.DataFrame(
{
"a": np.random.choice([1, 2], 5, replace=True),
"b": np.random.choice([11, 12, 13], 5, replace=True),
"c": np.random.choice([21, 22, 23], 5, replace=True),
}
)
df
Output
a b c
0 2 11 23
1 2 11 23
2 1 12 23
3 2 12 21
4 1 12 21
crosstab
output is:
cross_tab = pd.crosstab(
index=df.a, columns=[df.b, df.c], rownames=["a"], colnames=["b", "c"]
)
cross_tab
b 11 12
c 23 21 23
a
1 0 1 1
2 2 1 0
Now let's say you want to access value when a==2
, b==11
and c==23
, then simply do
cross_tab.loc[2].loc[11].loc[23]
2
Why does this work? .loc
allows one to select by index labels. In the dataframe output by crosstab
, our erstwhile column values now become index labels. Thus, with every .loc
selection we do, it gives the slice of the dataframe corresponding to that index label. Let's navigate cross_tab.loc[2].loc[11].loc[23]
step by step:
cross_tab.loc[2]
yields:
b c
11 23 2
12 21 1
23 0
Name: 2, dtype: int64
Next one:
cross_tab.loc[2].loc[11]
Yields:
c
23 2
Name: 2, dtype: int64
And finally we have
cross_tab.loc[2].loc[11].loc[23]
which yields:
2
Why do I say that this reduces the readability a bit? Because to understand this selection you have to be aware of how the crosstab was created, i.e. rows are a
and columns were in the order [b, c]
. You have to know that to be able to interpret what cross_tab.loc[2].loc[11].loc[23]
would do. But I have found that often to be a good tradeoff.