1

I would like to get a dataframe of counts from a pandas pivot table, but for the aggregate function to include every index. For example

df1 = pd.DataFrame({"A": ["foo", "bar", "shtuff"],
                   "B": ["one", "two", "one"],
                   "C": [1, 2, 3]})

df2 = pd.DataFrame({"C": [1,2,2,1,2,1,1,2,2],
                   "D": ["A", "B", "A", "B", "A",
                         "B", "A", "C", "A"]})

df = df1.merge(df2,how='left')

pd.pivot_table(df, index='A',columns='D', values='C', aggfunc='count', fill_value=0)

will result in

D    A  B  C
A
bar  3  1  1
foo  2  2  0

But I would like to get

D       A  B  C
A
bar     3  1  1
foo     2  2  0
shtuff  0  0  0

How can I achieve this? I've been looking through the documentation but can't seem to find an answer. Thanks in advance :)

Juliepvr
  • 33
  • 3

1 Answers1

1

You can use a Categorical:

df1['A'] = pd.Categorical(df1['A']) 

df = df1.merge(df2, how='left')

pd.pivot_table(df, index='A',columns='D', values='C', aggfunc='count',
               fill_value=0, observed=False)

Note that your pivot_table is equivalent to a crosstab:

pd.crosstab(df['A'], df['D'], dropna=False)

Output:

D       A  B  C
A              
bar     3  1  1
foo     2  2  0
shtuff  0  0  0

Alternatively, reindex:

pd.pivot_table(df, index='A',columns='D', values='C', aggfunc='count',
               fill_value=0, observed=False
              ).reindex(df['A'].unique(), fill_value=0)
mozway
  • 194,879
  • 13
  • 39
  • 75