How filter an inner dataframe based on the row of an outer dataframe

Question

**My goals: ** filter an inner dataframe based on the row (then the values) of an outer dataframe

Let's consider three dataframes:

first a list of shop with detail about the shop:

Shop = pd.DataFrame({
                    'Shop_number': ['SH10','SH20', 'SH30','SH40' ,'SH50','SH60','SH70'],
                    'Shop': ['Chisinau','Milan', 'Paris','Madrid', 'Lyon','London', 'foo']
                    })
Shop

enter image description here

then a dataframe containing the sales records:

Sales = pd.DataFrame({
                        'Shop_number': ['SH20', 'SH30', 'SH50','SH70'],
                        'Amount':[200,400,50,1000],
                        'Type' : 'Actual'
                    })
Sales

Which gives:
enter image description here

then a dataframe containing the budget sales records:

Budget = pd.DataFrame({
                        'Shop_number': ['SH10','SH20', 'SH30','SH40' ,'SH50','SH60','SH70'],
                        'Amount':[100,150,200,300,40,500,150],
                        'Type' : 'Budget'
                    })
print(Budget)

enter image description here

Let's summarize data as is:

base=pd.concat([Sales,Budget])
base

Then what I try to achieve is express Sales and Budget based on Shop table as:

enter image description here

(Shop
    .assign(
        Budget = lambda x : 
            base.loc[(base['Shop_number']=='SH20') & (base['Type']=='Budget')].loc[:,'Amount'].sum(),
        Actual = lambda x : 
            base.loc[(base['Shop_number']=='SH20') & (base['Type']=='Actual')].loc[:,'Amount'].sum(),
        var = lambda x : 
            x.Budget-x.Actual
    )
)

Then it work as intended:

enter image description here

But when I try to make it dynamic then it fails:

(Shop
    .assign(
        Budget = lambda x : 
            base.loc[(base['Shop_number']==x.Shop_number) & (base['Type']=='Budget')].loc[:,'Amount'].sum(),
        Actual = lambda x : 
            base.loc[(base['Shop_number']==x.Shop_number) & (base['Type']=='Actual')].loc[:,'Amount'].sum(),
        var = lambda x : 
            x.Budget-x.Actual
    )
)

and returns a trace back:

ValueError: Can only compare identically-labeled Series objects

My question is:

How would you do to make it work? Would you handle the purpose of this calculation like this or on another more optimal way?

I try what is in the description then it failed. Of course I tried some make around but it failed as well can't find the solution

Boubacar Traoré · Answer 1 · 2023-06-05T13:44:30.247

The reason you got the error is because base and Shop don't have the same length (number of rows). When you use .assign with lambda, x will anonymously replace your Shop dataframe. In fact, the filtering can't be done in this case due to length mismatch (impossible comparison in general). You can try this, and you'll have the same issue.

print(base['Shop_number']==Shop.Shop_number)

ValueError: Can only compare identically-labeled Series objects

I suggest defining a function that do the filtering job for you. Assuming we have initialized all dataframe as you mentioned above :

# this function will be applied to a "DataFrameGroupBy" object and compute results
def get_info(df):
    budget = df.query('Type=="Budget"').Amount.sum()
    actual = df.query('Type=="Actual"').Amount.sum()
    var    = budget - actual
    return budget, actual, var

Then you can call the function by applying it to your groupby and get the exact result :

Shop.merge(
    base.groupby('Shop_number').apply(get_info).apply(pd.Series).reset_index(),
    on='Shop_number',
    how='left'
).set_axis(['Shop_number','Shop','Budget','Actual','var'], axis=1)

Output

  Shop_number      Shop  Budget  Actual  var
0        SH10  Chisinau     100       0  100
1        SH20     Milan     150     200  -50
2        SH30     Paris     200     400 -200
3        SH40    Madrid     300       0  300
4        SH50      Lyon      40      50  -10
5        SH60    London     500       0  500
6        SH70       foo     150    1000 -850

Note that the set_axis is just for renaming column names, you can change it and use anything else you want (like .rename).

Hope this help.

Hi Boubacar,the aim is to have base on table shop the dataframe contening the sales and budget dynamically filtered , at each different row — send_, Jun 04 '23 at 13:52
I edited my answer now, I think I understood your problem now. Let me know if it solved your issue. — Boubacar Traoré, Jun 05 '23 at 13:45
Looks great , let me try That tomorrow, i Will tell you how it works , cheers — send_, Jun 05 '23 at 19:41

How filter an inner dataframe based on the row of an outer dataframe

1 Answers1