1

I'm looking to build a pivot table which should tell me how many distinct stores have sold an item for a cumulative amount of months.

I have sales data from a convenience stores chain. The data is provided by 'item - store - month' level of aggregation, with values such as '# of items sold', 'total sum' etc. Stores are given by distinct id.

Here's an excerpt from my data:

Here's the final table I want to achieve (I can do it in Pandas, not in Excel). enter image description here

To understand what's going on in this table:

E.g. sku #2 was being sold in 18 stores for 1 month out of 12; in 44 shops for 2 months out of 12 etc.;

it wasn't sold for 12 months in any of the stores (value 0)

sku # 3 was being sold in 25 stores for 1 month out of 12; in 41 stores for 2 months out of 12 etc.;

it was sold for 12 months in 33 stores out of 471 unique stores that ever sold this item this year.

Totals column shows unique stores that sold the item for the whole year, totals row shows unique stores that sold all items for the selected number of months.

Is there a way of achieving this in Excel?

Edit: here's the original data in xlsx format: OneDrive storage

Edit 2: The steps I did in pandas to achieve this result are:

  1. Group by [item_name, store_id] and calculate the number of unique months (distinct count) within each groups. The resulting table has three columns now: item_name, store_id, number of months.
  2. Pivot the resulting table: rows = item_name, cols = unique_months, vals = store_id, aggregation function = nunique (distinct count).

sku_stores_months = db_all.groupby(['item_name', 'store_id'])['month'].nunique().reset_index()

sku_stores_months = sku_stores_months.pivot_table(index='item_name', columns='month', values='store_id', aggfunc='nunique', margins=True, fill_value=0, dropna=False).reset_index()

sku_stores_months = sku_stores_months.melt(id_vars=['item_name'])

The last 'melt' step is only needed to save to Excel so that I can recreate pivot from the 'long' table in Excel.

Edit 3: I have been able to achieve my goal by reproducing steps I did in Pandas. First, I created a helper sheet with a Pivot Table on store_id and item_name as rows and a Distinct Count of dates as values. I had to also choose an old style Pivot Table in Excel 365 and check the Fill all values checkbox to get a grouped table like pandas would do.

Then I built another Pivot Table based on the helper table where I put item_names in rows, the new 'total' field (distinct count from helper) to columns, and another distinct count of store_id in values.

The resulting table works; however there are two downsides that make me keep looking for a cleaner solution:

  • need for a hidden sheet with intermediate calculations
  • in case of source update (which in my case happens quite often) I have to manually unhide helper sheet, update helper Pivot table, then update the resulting table. It is very frustrating.
wujie
  • 21
  • 2
  • *Is there a way of achieving this in Excel?* Probably yes but not sure with Pivot Tables. You should add an example data easy to copy/paste (not just an image) and a sample output of the data provided (so everyone willing to help can check if they are doing correctly what you need). – Foxfire And Burns And Burns Mar 29 '23 at 11:25
  • I have added the original data. – wujie Mar 29 '23 at 12:17
  • I get 18 for item_name 2 for 1 month, but I get 52 for same item for 2 months. ¿Are you sure 44 is the right output? Can you explain the logic behind the calculus? – Foxfire And Burns And Burns Mar 29 '23 at 12:28
  • 1. Group by [item_name, store_id] and calculate number of unique months (distinct count) within each groups. The resulting table has three columns now: item_name, store_id, number of months. 2. Pivot the resulting table: rows = item_name, cols = unique_months, vals = store_id, aggregation function = nunique (distinct count). – wujie Mar 29 '23 at 13:12
  • These are the steps I do it by in pandas. sku_stores_months = db_all.groupby(['sku_name_ru', 'store_id'])['month'].nunique().reset_index() sku_stores_months = sku_stores_months.pivot_table(index='sku_name_ru', columns='month', values='store_id', aggfunc='nunique', margins=True, fill_value=0, dropna=False).reset_index() sku_stores_months = sku_stores_months.melt(id_vars=['sku_name_ru']) The last step 'melt' is only needed to save to Excel so that I can recreate pivot from the 'long' table in Excel. – wujie Mar 29 '23 at 13:12
  • @Foxfire And Burns And Burns I'm thinking your calculation does **cumulative of cumulative**, i.e. for 2 months you get distinct stores for month + for 2 months, which is mathematically sound but logically void (value for 2 months should already include stores for 1 + 2 months). This gets complicated to explain, sorry. It's quite a convoluted calculation. – wujie Mar 29 '23 at 13:17
  • I see your point. You should edit your question and add all this new info. – Foxfire And Burns And Burns Mar 29 '23 at 13:22
  • Please edit your question to provide the original data sample **as text which can be copy/pasted**. NOT just an image. – Ron Rosenfeld Mar 29 '23 at 19:14
  • @RonRosenfeld Have already done that, there's a link in the OP. – wujie Mar 30 '23 at 05:56

0 Answers0