Pandas pivot_table with pd.grouper and Margins

Question

Margins=True will not work in Pandas pivot_table when columns is set as pd.grouper datetime. this is my code which works as expected--

p = df.pivot_table(values='Qty', index=['ItemCode', 'LineItem'],columns=pd.Grouper(key = 'Date', freq='W'), aggfunc=np.sum, fill_value=0)

but if I add margins=True, so I get a subtotal, I get error saying:

KeyError: "[TimeGrouper(key='In time', freq=, axis=0, sort=True, closed='left', label='left', how='mean', convention='e', base=0)] not in index"

Can you add some sample data, please ? – Marcel Flygare May 26 '20 at 09:53 — Marcel Flygare, May 26 '20 at 09:53

Bertil Johannes Ipsen · Answer 1 · 2020-05-28T11:52:53.130

That looks strange! I wonder what causes the pivot table to use the TimeGrouper itself to be used as the index. It's seems like a bug, but I'm not sure. In any case, I think pivottables aren't able to do sub-index margins, so here is a solution with groupby instead:

Sample data

import pandas as pd
from random import randint, choice
from string import ascii_letters, ascii_lowercase

# Say we have a dataframe with 500 rows and 20 different items
df_len = range(500)
item_codes = [''.join([choice(ascii_letters) for _ in range(10)]) for __ in range(20)]
df = pd.DataFrame({
    'ItemCode': [choice(item_codes) for __ in df_len],
    'Date': [pd.datetime.today() - pd.Timedelta(randint(0, 28), 'D') for _ in df_len],
    'Qty': [randint(1,10) for _ in df_len],
    'LineItem': [choice(('a', 'b', 'c')) for _ in df_len],
})

df.head()

     ItemCode                       Date  Qty LineItem
0  IFaEmWGHTJ 2020-05-21 13:29:56.687412    8        a
1  jvLqoLfBcd 2020-05-23 13:29:56.687509    6        a
2  GOPFJEoSUm 2020-05-13 13:29:56.687550    1        a
3  qJqzzgDTaa 2020-05-03 13:29:56.687575    5        a
4  BCvRrgcpFD 2020-05-24 13:29:56.690114    8        b

Solution

res = (df.groupby(['ItemCode', 'LineItem', pd.Grouper(key='Date', freq='W')])['Qty']
       .count()
       .unstack()
       .fillna(0))
res.loc[('column_total', ''), :] = res.sum(axis=0)
res.loc[:,'row_total'] = res.sum(axis=1)

Result

|                      |   2020-05-03 |   2020-05-10 |   2020-05-17 |   2020-05-24 |   2020-05-31 |   row_total |
|:---------------------|-------------:|-------------:|-------------:|-------------:|-------------:|------------:|
| ('CtdClujjRF', 'a')  |            1 |            2 |            2 |            0 |            0 |           5 |
| ('CtdClujjRF', 'b')  |            0 |            3 |            1 |            1 |            1 |           6 |
| ('CtdClujjRF', 'c')  |            1 |            1 |            2 |            2 |            1 |           7 |
| ('DnQcEbHoVL', 'a')  |            0 |            2 |            1 |            1 |            1 |           5 |
| ('DnQcEbHoVL', 'b')  |            1 |            1 |            1 |            2 |            2 |           7 |
                     ...            ...            ...            ...            ...            ...           ...
| ('sxFnkCcSJu', 'c')  |            0 |            2 |            2 |            3 |            0 |           7 |
| ('vOaWNHgOgm', 'a')  |            0 |            5 |            1 |            7 |            1 |          14 |
| ('vOaWNHgOgm', 'b')  |            1 |            0 |            1 |            3 |            4 |           9 |
| ('vOaWNHgOgm', 'c')  |            1 |            2 |            2 |            5 |            1 |          11 |
| ('column_total', '') |           64 |          128 |          115 |          127 |           66 |         500 |

Nicely done. I guess groupby does have some advantages over pivot_table after all. — jroyce, Jun 08 '20 at 17:52

Pandas pivot_table with pd.grouper and Margins

1 Answers1