df_a
and df_b
are two dataframes that looks like following
df_a
A B C D E
x1 Apple 0.3 0.9 0.6
x1 Orange 0.1 0.5 0.2
x2 Apple 0.2 0.2 0.1
x2 Orange 0.3 0.4 0.9
x2 Mango 0.1 0.2 0.3
x3 Orange 0.3 0.1 0.2
df_b
A B_new F
x1 Apple 0.3
x1 Mango 0.2
x1 Orange 0.1
x2 Apple 0.2
x2 Orange 0.3
x2 Mango 0.1
x3 Orange 0.3
x3 Mango 0.2
x3 Apple 0.1
I want my final_df
to contain all the rows contained in df_a
such that it contemplates the unique combination of df_a['A'] == df_b['A']
and df_a['B'] == df_b['B_new']
.
I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df
but the value of B_new is not retained.
Following is how I want my result_df
to look like:
result_df
A B C D E B_new F
x1 Apple 0.3 0.9 0.6 Apple 0.3
x1 Orange 0.1 0.5 0.2 Orange 0.1
x2 Apple 0.2 0.2 0.1 Apple 0.2
x2 Orange 0.3 0.4 0.9 Orange 0.3
x2 Mango 0.1 0.2 0.3 Mango 0.1
x3 Orange 0.3 0.1 0.2 Orange 0.3
I also tried left outer join:
final_df = pd.merge(df_a, df_b, how="left", on=['A'])
The size of this dataframe is a union of df_a
and df_b
which is not what I want.
Appreciate any suggestions.