0

So I want to spread the shipments per ID in the group one by one by looking at avg sales to determine who to give it to.

Here's my dataframe:

ID   STOREID    BAL       SALES   SHIP
1     STR1      50        5       18
1     STR2      6         7       18
1     STR3      74        4       18
2     STR1      35        3       500
2     STR2      5         4       500
2     STR3      54        7       500

While SHIP (grouped by ID) is greater than 0, calculate AVG (BAL/SALES) and the lowest AVG per group give +1 to its column BAL and +1 to its column final. And then repeat the process until SHIP is 0. The AVG would be different every stage which is why I wanted it to be a while loop.

Sample output of first round is below. So do this until SHIP is 0 and SUM of Final per ID is = to SHIP:

ID   STOREID   BAL   SALES  SHIP   AVG        Final
1    STR1      50    5      18     10         0
1    STR2      6     4      18     1.5        1
1    STR3      8     4      18     2          0
2    STR1      35    3      500    11.67      0
2    STR2      5     4      500    1.25       1
2    STR3      54    7      500    7.71       0

I've tried a couple of ways in SQL, I thought it would be better to do it in python but I haven't been doing a great job with my loop. Here's what I tried so far:

df['AVG'] = 0
df['FINAL'] = 0

for i in df.groupby(["ID"])['SHIP']:
    if i > 0:
          df['AVG'] = df['BAL'] / df['SALES']
          df['SHIP'] = df.groupby(["ID"])['SHIP']-1
          total = df.groupby(["ID"])["FINAL"].transform("cumsum")
          df['FINAL'] = + 1
          df['A'] = + 1
    else:
         df['FINAL'] = 0
martineau
  • 119,623
  • 25
  • 170
  • 301
its.kcl
  • 123
  • 7
  • Can you explain what you mean by "So I want to spread the shipments per ID in the group one by one"? Why are you showing the [Final] column as x in your expected output when your sample code shows an integer value? Can you utilize a different data structure for iterative computations? – itprorh66 Nov 14 '21 at 21:00
  • So in the final column, I'm trying to allocate the SHIP across the FINAL column, one by one by looking at the lowest AVG column. I just used X as a placeholder of a value. I'll edit that. – its.kcl Nov 14 '21 at 21:05

1 Answers1

0

This was challenging because more than one row in the group can have the same average calculation. then it throws off the allocation.

This works on the example dataframe, if I understood you correctly.

d = {'ID': [1, 1, 1, 2,2,2], 'STOREID': ['str1', 'str2', 'str3','str1', 'str2', 'str3'],'BAL':[50, 6, 74, 35,5,54], 'SALES': [5, 7, 4, 3,4,7], 'SHIP': [18, 18, 18, 500,500,500]} 
df = pd.DataFrame(data=d)
df['AVG'] = 0
df['FINAL'] = 0

def calc_something(x):
    # print(x.iloc[0]['SHIP'])
    for i in range(x.iloc[0]['SHIP'])[0:500]:
          x['AVG'] = x['BAL'] / x['SALES']
          x['SHIP'] = x['SHIP']-1
          x = x.sort_values('AVG').reset_index(drop=True)
          # print(x.iloc[0, 2])
          x.iloc[0, 2] = x['BAL'][0] + 1
          x.iloc[0, 6] = x['FINAL'][0] + 1
    return x

df_final = df.groupby('ID').apply(calc_something).reset_index(drop=True).sort_values(['ID', 'STOREID'])
df_final

   ID STOREID  BAL  SALES  SHIP    AVG  FINAL
1   1    STR1   50      5     0 10.000      0
0   1    STR2   24      7     0  3.286     18
2   1    STR3   74      4     0 18.500      0
4   2    STR1  127      3     0 42.333     92
5   2    STR2  170      4     0 42.500    165
3   2    STR3  297      7     0 42.286    243
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14
  • It actually gave me a different output, I did the exact same thing? I got similar IDs (which is 1) – its.kcl Nov 15 '21 at 16:10
  • sorry, don't undersatnd what this means: "I got similar IDs (which is 1)" it's unclear how it's different unless you are using different input dataframe than what you provided above. – Jonathan Leon Nov 15 '21 at 16:37
  • Yes, I'm using exactly the same, but the loop gets mixed up? Here's my data frame: d = {'ID': [1, 1, 1, 2,2,2], 'STOREID': ['str1', 'str2', 'str3','str1', 'str2', 'str3'],'BAL':[50, 6, 74, 35,5,54], 'SALES': [5, 7, 4, 3,4,7], 'SHIP': [18, 18, 18, 500,500,500]} df = pd.DataFrame(data=d) – its.kcl Nov 15 '21 at 16:46
  • And when I tried running the code, I got an output of 5 rows will all IDs as 1 and it seems like all the SHIP was given to the smallest avg instead of spread out – its.kcl Nov 15 '21 at 16:48
  • ran it again with your data and get it the same correct result. you aren't moving or adding columns? .iloc[0, 2] is finicky. looking for specific rows/columns. unfortunately it's hard to say why it's not working for you. – Jonathan Leon Nov 15 '21 at 16:59
  • oh, for your logic, the ship goes to the small avg, and for id 1, it all goes to str2. but for id 2 the ships are spread out. is that what you mean? i think you need to look at the logic in the function as it may not actually be what you want, just what i understood you to be asking for. – Jonathan Leon Nov 15 '21 at 17:00
  • That's so weird, I get a different result and I copied what you have exactly - even restarted my jupyter kernel. You are priting the df_final right? My df_final prints 5 rows with all 1 as an ID. Perhaps I need to reset index somewhere? Cos my index shows 1,4,0,3,2,5 instead of 1,0,2,4,5,3 like yours – its.kcl Nov 15 '21 at 17:35
  • But I do think what you have is helpful, I'm just trying to figure out why I'm getting a different output with the exact same df – its.kcl Nov 15 '21 at 17:38
  • updated to include your data. if you copy from d through df_final, you should end up with my result. i'm on the latest pandas and python versions, maybe check that as well. – Jonathan Leon Nov 15 '21 at 17:43
  • Omg yes, it's because of the pandas version. Crazy. Thank you! – its.kcl Nov 15 '21 at 17:50