6

So I have this df

SUPPLIER   PRODUCTID   STOREID   BALANCE   AVG_SALES   TO_SHIP
SUP1       P1          STR1      50        5           18
SUP1       P1          STR2      6         7           18
SUP1       P1          STR3      74        4           18
SUP2       P4          STR1      35        3           500
SUP2       P4          STR2      5         4           500
SUP2       P4          STR3      54        7           500

It's always grouped by Supplier and product ID. The TO_SHIP column is unique for the group. So for example, I have 18 products for that SUP1 with P1 to send. Then I add new columns:

  • Calculate Wk_bal = (BALANCE / AVG_SALES)
  • Rank Wk_bal per supplierid-productid group
  • Lowest Wk_bal for the group : SEND_PKGS = +1
  • Then Calculate Wk_bal again but add pkg sent = ((BALANCE+SEND_PKGS) / AVG_SALES)
  • So this loops until all TO_SHIP has been distributed to the stores who need the most

To visualize a run:

First output (calculate wk_bal, then send 1 pkg to the lowest):

SUPPLIER   PRODUCTID   STOREID   BALANCE   AVG_SALES   TO_SHIP   Wk_Bal     SEND_PKGS
SUP1       P1          STR1      50        5           18        10         0           
SUP1       P1          STR2      6         4           18        1.5        1
SUP1       P1          STR3      8         4           18        2          0
SUP2       P4          STR1      35        3           500       11.67      0
SUP2       P4          STR2      5         4           500       1.25       1
SUP2       P4          STR3      54        7           500       7.71       0

Second output (calculate updated wk_bal, send one pkg to lowest):

SUPPLIER   PRODUCTID   STOREID   BALANCE   AVG_SALES   TO_SHIP   Wk_Bal     SEND_PKGS 
SUP1       P1          STR1      50        5           17        10         0           
SUP1       P1          STR2      8         4           17        1.75       2
SUP1       P1          STR3      8         4           17        2          0
SUP2       P4          STR1      35        3           499       11.67      0
SUP2       P4          STR2      7         4           499       1.5        2
SUP2       P4          STR3      54        7           499       7.71       0

And so on...so until there is to_ship left, calculate-rank-give one pkg. The reason for this process is I want to make sure that the store with the lowest wk_balance get the package first. (and there's a lot of other reasons why)

I initially built this on SQL, but with the complexity I moved to python. Unfortunately my python isn't very good in coming up with loops with several conditions esp on pandas df. So far I've tried (and failed):

df['Wk_Bal'] = 0
df['TO_SHIP'] = 0

for i in df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']:
    if i > 0:
          df['Wk_Bal'] = df['BALANCE'] / df['AVG_SALES']
          df['TO_SHIP'] = df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']-1
          df['SEND_PKGS'] = + 1
          df['BALANCE'] = + 1
    else:
         df['TO_SHIP'] = 0

How do I do this better?

its.kcl
  • 123
  • 7

1 Answers1

1

Hopefully I've understood all of your requirements. Here is your original data:

df = pd.DataFrame({'SUPPLIER': ['SUP1', 'SUP1', 'SUP1', 'SUP2', 'SUP2', 'SUP2'],
          'PRODUCTID': ['P1', 'P1', 'P1', 'P4', 'P4', 'P4'],
          'STOREID': ['STR1', 'STR2', 'STR3', 'STR1', 'STR2', 'STR3'],
          'BALANCE': [50, 6, 74, 35, 5, 54],
          'AVG_SALES': [5, 4, 4, 3, 4, 7],
          'TO_SHIP': [18, 18, 18, 500, 500, 500]})

Here is my approach:

df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
  
while (df['TO_SHIP'] != 0).any():
  lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])['Wk_bal'].idxmin()
  df.loc[lowest_idx, 'SEND_PKGS'] += 1
  df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
  df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1

I continue updating df until the TO_SHIP column is all zero. Then I increment SEND_PKGS which correspond to the lowest Wk_bal of each group. Then update Wk_bal and decrement any non-zero TO_SHIP columns.

I end up with:

    SUPPLIER    PRODUCTID   STOREID BALANCE AVG_SALES   TO_SHIP SEND_PKGS   Wk_bal
0   SUP1        P1          STR1    50      5           0          0            10.000000
1   SUP1        P1          STR2    6       4           0         18            6.000000
2   SUP1        P1          STR3    74      4           0          0            18.500000
3   SUP2        P4          STR1    35      3           0         92            42.333333
4   SUP2        P4          STR2    5       4           0        165            42.500000
5   SUP2        P4          STR3    54      7           0        243            42.428571

Edit: In the case of multiple Wk_bal minimums, we can choose based on the minimum AVG_SALES:

def find_min(x):
  num_mins = x["Wk_bal"].loc[x["Wk_bal"] == x["Wk_bal"].min()].shape[0]
  if num_mins == 1:
    return(x["Wk_bal"].idxmin())
  else:
    min_df = x.loc[x["Wk_bal"] == x["Wk_bal"].min()]
  return(min_df["AVG_SALES"].idxmin())

Then, more or less as before:

df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']

while (df['TO_SHIP'] != 0).any():
  lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])[['Wk_bal', 'AVG_SALES']].apply(find_min)
  df.loc[lowest_idx, 'SEND_PKGS'] += 1
  df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
  df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1
Ted
  • 1,189
  • 8
  • 15
  • 1
    And now I see that you had already asked this question here https://stackoverflow.com/q/69966974/3763302 – Ted Nov 16 '21 at 23:42
  • How do you handle similar Wk_bal in this approach? Was just wondering how you might do that. – its.kcl Nov 17 '21 at 12:39
  • So right now, it's looking at Wk_bal I was thinking if both are similar, I'll look at AVG Sales. But not sure how I can inject that. – its.kcl Nov 17 '21 at 12:40
  • Yes, if there's more than one minimum for `Wk_bal`, then `idxmin` will returns the index of first occurrence of the minimum, so it'll be arbitrary which of the minimum valued rows will be incremented. See updated answer if you would like to choose by `AVG_SALES` – Ted Nov 17 '21 at 14:53