0

I am very new to python or pandas coding. so I am kind of struck here and any input is appreciated. I have two df, individually ordered based on a criteria. df1 : list of orders with quantity df2 : list of inventories with quantity and date available. quantity necessarily not to equal to order quantity.

I need to pop first order in df1 and keep popping inventory in df2 until order quantity satisfied and also maintain, how many inventory I took to fulfill the order

Any help would be greatly appreciated

Example 

df 1: 
order1 5
order2 4
order3 5

df 2: inventories 
inven1 7 07/06/2020
inven2 7 07/10/2020

my final output should look like this.

inven1 | 5 | 07/06/2020 | order1 
inven1 | 2 | 07/06/2020 | order2 
inven2 | 2 | 07/10/2020 | order2
inven2 | 5 | 07/10/2020 | order3 

@Ian Thanks for your answer it works perfectly fine, only problem I am having is, these inventory and order have to be matched based on a metaID, inventories in metaId should be applied to order of the same metaID.

I tried looping through the metaID's and performing the order inventory match as below. I get the result I wanted, but the biggest problem is it take 20 mins for a data size for just 2K rows.

I tried solving it by sorting the orders and inventories by metaID, and passing the entire dataframe. it works when Order and inventories have exact matching count,

df 1: 
metaId1 order1 5
metaId1 order2 4
metaId1 order3 5
metaId2 order4 10

df 2: inventories 
metaId1 inven1 7 07/06/2020
metaId1 inven2 7 07/10/2020
metaId2 inven3 10 07/10/2020

my final output is all fine.

metaId1 | inven1 | 5 | 07/06/2020 | order1 
metaId1 | inven1 | 2 | 07/06/2020 | order2 
meatId1 | inven2 | 2 | 07/10/2020 | order2
metaId1 | inven2 | 5 | 07/10/2020 | order3 
metaId2 | inven3 | 10 | 07/10/2020 | order4 

but crossover between metaId's happened when the inventory is excess or short, cause we are matching based on the index. how can you match based on metaId then by index within. the for loop solution I had takes forever for huge data and i know thats not the right solution. These order matching under each metaId can happen in parallel and results can be merged. That how I did it with for loop. I dunno how to parallelize them in python/pandas or dunno is there a better solution than parallel computation. It seems like I need to apply groupBy-appLy-combine concept, but the I see examples of that concept with single dataframe and apply some logic for that row, then combine it. My issue is, I have two dataframes, have to group by and apply the "combine-first()" solution to that group and combine the results from each group. I not sure how to do it. can you help me on this.

current problem 

df 1: 
metaId1 order1 5
metaId1 order2 4
metaId1 order3 5
metaId2 order4 10

df 2: inventories 
metaId1 inven1 7 07/06/2020
metaId1 inven2 5 07/10/2020
metaId2 inven3 10 07/10/2020


final output : --> not right 

metaId1 | inven1 | 5 | 07/06/2020 | order1 
metaId1 | inven1 | 2 | 07/06/2020 | order2 
meatId1 | inven2 | 2 | 07/10/2020 | order2
metaId1 | inven2 | 3 | 07/10/2020 | order3 
metaId2 | inven3 | 2 | 07/10/2020 | order3 -->crossover
metaId2 | inven3 | 8 | 07/10/2020 | order4

expected output : 
metaId1 | inven1 | 5 | 07/06/2020 | order1 
metaId1 | inven1 | 2 | 07/06/2020 | order2 
meatId1 | inven2 | 2 | 07/10/2020 | order2
metaId1 | inven2 | 3 | 07/10/2020 | order3 
metaId1 | short  | 2 | 07/10/2020 | order3-->short by 2 inven to fulfill order
metaId2 | inven3 | 10 | 07/10/2020 | order4
Subha
  • 59
  • 7

1 Answers1

0

Create df1:

From order 1 = 5, create 5 elmenets of [order 1, order 1, order 1, order 1, order 1] in a single row. The same as with order 2 and order 3

import pandas as pd

df1 = pd.DataFrame({'Order Number':['order 1','order 2','order 3'],
              'Quantity':[5,4,3]})

df1 = df1.set_index('Order Number')
df1  = df1.loc[df1.index.repeat(df1['Quantity'])]
df1

Create df2:

From inven1 = 7, create 7 elements of [inven1,inven1,inven1,inven1,inven1,inven1] in a single row. The same as with inven2

df2  = pd.DataFrame({'Batch':['inven1','inven2','inven3'],
                    'Quantity':[7,4,10],
                    'Date Available':['2020-07-06','2020-07-10','2020-07-12']
                    })

df2 = df2.set_index('Date Available')
df2 = df2.loc[df2.index.repeat(df2['Quantity'])]
df2

Create df3:

Place all order in one column and all inven1 in another column to match them one is to one.

Make sure to replace NAs under Order to determine which inventory doesn't have a matching order number.

df3 = df1.reset_index().combine_first(df2.reset_index()).reset_index()
#Make sure to replace NAs with "Available Stock"
df3['Order Number']  = df3['Order Number'].fillna('Available Stock')
df3

Create df4:

Use groupby to create a pivot table showing the count per order filled in by inventory.

df4 = df3.groupby(['Batch','Date Available','Order Number']).count().reset_index().drop(labels = ['index'], axis = 1)
df4

Result should show:

    Batch   Date Available  Order Number       Quantity
0   inven1  2020-07-06      order 1            5
1   inven1  2020-07-06      order 2            2
2   inven2  2020-07-10      order 2            2
3   inven2  2020-07-10      order 3            2
4   inven3  2020-07-12      Available Stock    9
5   inven3  2020-07-12      order 3            1

Sources:

  1. Concatenate two dataframes of different sizes (pandas)
  2. Repeat rows in a pandas DataFrame based on column value
  • this worked :) would you mind adding some explanation? I would like to learn why you did what you did. – Subha Jun 22 '20 at 04:14
  • Hello Ian would you help me on how to get the excess inventory if available? I am kind of loosing when I have excess inventory in the final output. how can I solve that? Thank you so much for your help – Subha Jul 02 '20 at 00:12
  • Hello Ian, I have updated my question with a additional problem I am facing. Would you mind taking a look? – Subha Jul 05 '20 at 00:29
  • I use Pyspark and Pandas combination, as for most I use Pyspark, only for this order-inventory application In use pandas. so in my pyspark df I used df.cache(). but I still use for loop to go over each metaId in order df and inventory df. which is an anti-pattern – Subha Jul 06 '20 at 18:47