I'm looking to build a pivot table which should tell me how many distinct stores have sold an item for a cumulative amount of months.
I have sales data from a convenience stores chain. The data is provided by 'item - store - month' level of aggregation, with values such as '# of items sold', 'total sum' etc. Stores are given by distinct id.
Here's an excerpt from my data:
Here's the final table I want to achieve (I can do it in Pandas, not in Excel).
To understand what's going on in this table:
E.g. sku #2 was being sold in 18 stores for 1 month out of 12; in 44 shops for 2 months out of 12 etc.;
it wasn't sold for 12 months in any of the stores (value 0)
sku # 3 was being sold in 25 stores for 1 month out of 12; in 41 stores for 2 months out of 12 etc.;
it was sold for 12 months in 33 stores out of 471 unique stores that ever sold this item this year.
Totals column shows unique stores that sold the item for the whole year, totals row shows unique stores that sold all items for the selected number of months.
Is there a way of achieving this in Excel?
Edit: here's the original data in xlsx format: OneDrive storage
Edit 2: The steps I did in pandas to achieve this result are:
- Group by [item_name, store_id] and calculate the number of unique months (distinct count) within each groups. The resulting table has three columns now: item_name, store_id, number of months.
- Pivot the resulting table: rows = item_name, cols = unique_months, vals = store_id, aggregation function = nunique (distinct count).
sku_stores_months = db_all.groupby(['item_name', 'store_id'])['month'].nunique().reset_index()
sku_stores_months = sku_stores_months.pivot_table(index='item_name', columns='month', values='store_id', aggfunc='nunique', margins=True, fill_value=0, dropna=False).reset_index()
sku_stores_months = sku_stores_months.melt(id_vars=['item_name'])
The last 'melt' step is only needed to save to Excel so that I can recreate pivot from the 'long' table in Excel.
Edit 3: I have been able to achieve my goal by reproducing steps I did in Pandas. First, I created a helper sheet with a Pivot Table on store_id and item_name as rows and a Distinct Count of dates as values. I had to also choose an old style Pivot Table in Excel 365 and check the Fill all values checkbox to get a grouped table like pandas would do.
Then I built another Pivot Table based on the helper table where I put item_names in rows, the new 'total' field (distinct count from helper) to columns, and another distinct count of store_id in values.
The resulting table works; however there are two downsides that make me keep looking for a cleaner solution:
- need for a hidden sheet with intermediate calculations
- in case of source update (which in my case happens quite often) I have to manually unhide helper sheet, update helper Pivot table, then update the resulting table. It is very frustrating.