2

I have two dataframes. The first one, grouper contains the mean sales quantity per month by item. Most items have values for all 12 months, because they have been sold for > 1 year. But the items that have been on sale < 1 year do not have values for all months. Example: grouper[grouper['Product'] == 'IT984359570']

        Product       Month   Sales Quantity [QTY]
4190    IT984359570   4       35.0
4191    IT984359570   5       208.0
4192    IT984359570   6       208.0
4193    IT984359570   7       233.0
4194    IT984359570   8       191.0

The second dataframe is a pivot table that displays the cumulative sum of sales by product, pivot_table. This also takes into account new orders (hence the positive amount in some cells). pivot_table[pivot_table['Product'] == 'IT984359570'] returns:

Date    Product     2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01  2022-10-01  2022-11-01
412     IT984359570 -208.0     -416.0       -649.0      -840.0      2019.0      NaN         NaN  

I would like to avoid dropping all rows that have NaN values. I want to fill all NaN values with the mean of all entries for the specific product from grouper. For product IT984359570: Fill all NaN values in row 412 with 175, as (35+208+208+233+191)/5=175.

I have tried doing that with the code

pivot_table = pivot_table.fillna(lambda row: grouper.loc[grouper['Product'] == row['Product'], 'Sales Quantity [QTY]'].mean())

However, I do not get the desired output. My output:

Date    Product      2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01  2022-10-01   2022-11-01    
412     IT984359570  -208.0      -416.0      -649.0      -840.0      2019.0      <function <lambda> at 0x0000023221232320>   <function <lambda> at 0x0000023221232320>

What am I doing wrong?

Edit:

pivot_table uses .cumsum(), so the desired output looks like this:

Date    Product     2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01  2022-10-01  2022-11-01
412     IT984359570 -208.0     -416.0       -649.0      -840.0      2019.0      1844.0      1669.0  
asuidncsdk
  • 67
  • 8

2 Answers2

1

Just adding onto the above contributions, I think adding the axis parameter would complete the code. Hope it helps.

pivot_table.apply(lambda x: x.fillna(grouper.loc[grouper['Product'] == x['Product'], 'Sales Quantity'].mean()), axis=1)
  • Thanks for your help, that worked. However, I ran into another problem: `pivot_table` uses .cumsum(). The code: `pivot_table = (final_output.groupby(['Product', 'Date'])['Quantity'].sum().reset_index().pivot_table(index=['Product'], columns='Date', values='Quantity').cumsum(axis=1).reset_index())`. However, when I try implementing .cumsum(axis=1) after your code, it throws me an error message. Do you have any idea how I can implement the .fillna() before the .cumsum() so that the filled in values are taken into account in the calculation? Please see my edit in the post above – asuidncsdk Sep 14 '22 at 15:33
  • Can you help me what kind of error message you're getting? Is it something like this? [unsupported operand type(s) for +: 'int' and 'str'] – Kim Hyun Bin Sep 14 '22 at 15:49
  • Additionally, I believe for your problem, if I understood it correctly, you could replace axis=1 with axis=0 for the .cumsum() part so that it becomes .cumsum(axis=0) and would work just fine. For me it worked but I am not familiar with the full picture of your dataframe so I can't say so confidently. – Kim Hyun Bin Sep 14 '22 at 15:52
  • 1
    I managed to make it work, thanks for your help. The code I used: `pivot_table = (final_output.groupby(['Product', 'Date'])['Quantity'].sum().reset_index().pivot_table(index=['Product'], columns='Date', values='Quantity').reset_index())` `pivot_table = pivot_table.apply(lambda x: x.fillna(grouper.loc[grouper['Product'] == x['Product'], 'Sales Quantity [QTY]'].mean()), axis=1)` `pivot_table.loc[:, ~pivot_table.columns.isin(['Product'])] = pivot_table.loc[:, ~pivot_table.columns.isin(['Product'])].cumsum(axis=1)` – asuidncsdk Sep 14 '22 at 15:52
0

To get the desired output:

Date    Product     2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01  2022-10-01  2022-11-01
412     IT984359570 -208.0     -416.0       -649.0      -840.0      2019.0      1844.0      1669.0  

I used:

pivot_table = (final_output.groupby(['Product', 'Date'])['Quantity'].sum().reset_index()
                           .pivot_table(index=['Product'], columns='Date', values='Quantity').reset_index()
)
                           
pivot_table = pivot_table.apply(lambda x: x.fillna(grouper.loc[grouper['Product'] == x['Product'], 'Sales Quantity [QTY]'].mean()), axis=1)
pivot_table.loc[:, ~pivot_table.columns.isin(['Product'])] = pivot_table.loc[:, ~pivot_table.columns.isin(['Product'])].cumsum(axis=1)

While it worked, I am not sure if this is the most pythonic way. Please advise if you have any idea how to achieve this with less code...

asuidncsdk
  • 67
  • 8