suppose i have two df like below:
import pandas as pd
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"c": [6,7,8,9]
}
df1 = pd.DataFrame(data_dic)
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"d": [10,10,12,13]
}
df2 = pd.DataFrame(data_dic)
Result:
df1
a b c
0 0 3 6
1 0 3 7
2 1 4 8
3 2 5 9
df2
a b d
0 0 3 10
1 0 3 10
2 1 4 12
3 2 5 13
where each of df have the same key value on column 'a' and 'b'.
Q: How to merge them without duplicates? [option with merging df and then delete duplicates is unsatisfactory]
I try to do this by using below codes but this creates duplicate rows and significantly increases the file capacity.
df = pd.merge(df1, df2, on=['a', 'b'] , how='left', validate='many_to_many')
df = pd.merge(df1, df2, on=['a', 'b'] , how='inner')
Expected result:
a b c d
0 0 3 6 10
1 0 3 7 10
2 1 4 8 12
3 2 5 9 13
Thank You All for hard work.