Joining two pandas dataframes based on multiple conditions

Question

df_a and df_b are two dataframes that looks like following

df_a
A   B       C      D     E
x1  Apple   0.3   0.9    0.6
x1  Orange  0.1   0.5    0.2
x2  Apple   0.2   0.2    0.1
x2  Orange  0.3   0.4    0.9
x2  Mango   0.1   0.2    0.3
x3  Orange  0.3   0.1    0.2


df_b
A   B_new   F    
x1  Apple   0.3  
x1  Mango   0.2  
x1  Orange  0.1   
x2  Apple   0.2   
x2  Orange  0.3     
x2  Mango   0.1  
x3  Orange  0.3  
x3  Mango   0.2  
x3  Apple   0.1

I want my final_df to contain all the rows contained in df_a such that it contemplates the unique combination of df_a['A'] == df_b['A'] and df_a['B'] == df_b['B_new'].

I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df but the value of B_new is not retained.

Following is how I want my result_df to look like:

result_df

 A   B       C      D     E   B_new  F
x1  Apple   0.3   0.9    0.6  Apple  0.3
x1  Orange  0.1   0.5    0.2  Orange 0.1
x2  Apple   0.2   0.2    0.1  Apple   0.2 
x2  Orange  0.3   0.4    0.9  Orange  0.3
x2  Mango   0.1   0.2    0.3  Mango   0.1
x3  Orange  0.3   0.1    0.2  Orange  0.3

I also tried left outer join:

final_df = pd.merge(df_a, df_b, how="left", on=['A'])

The size of this dataframe is a union of df_a and df_b which is not what I want.

Appreciate any suggestions.

score 24 · Accepted Answer · answered Nov 30 '18 at 00:22

You need an inner merge, specifying both merge columns in each case:

res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])

print(res)

    A       B    C    D    E   B_new    F
0  x1   Apple  0.3  0.9  0.6   Apple  0.3
1  x1  Orange  0.1  0.5  0.2  Orange  0.1
2  x2   Apple  0.2  0.2  0.1   Apple  0.2
3  x2  Orange  0.3  0.4  0.9  Orange  0.3
4  x2   Mango  0.1  0.2  0.3   Mango  0.1
5  x3  Orange  0.3  0.1  0.2  Orange  0.3

score 2 · Answer 2 · edited Sep 30 '19 at 05:07

2

You can still achieve this with a left join which is very ideal.
See below:

final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']);

edited Sep 30 '19 at 05:07

Norbert Bartko

2,468
1
17
36

answered Sep 30 '19 at 02:45

Daniel

68
5

Joining two pandas dataframes based on multiple conditions

2 Answers2

Linked