How to merge pandas data frames without getting duplicate rows

Question

I have 2 data frames, each with a common value in column C. I am using pd.merge to merge these data frames, but i dont want duplicate rows for each matching pair. This will be more clearly visualised below:

DF1

df = pd.DataFrame(columns=['A','B','C'])
df['A'] = ['2_21','2_34']
df['B'] = ['X1','X2']
df['C'] = ['2','2']
print(df)

     A  B  C
0  2_21 X1 2
1  2_34 X2 2

DF2

df2 = pd.DataFrame(columns=['D','E','C'])
df2['D'] = ['2_27','2_35']
df2['E'] = ['Y1','Y2']
df2['C'] = ['2','2']
print(df)
     D  E  C        
0  2_27 Y1 2
1  2_35 Y2 2`

Unwanted output/dataframe:

df1.merge(df2, how='left', on='C')

    A   B  C  D    E
0  2_21 X1 2  2_27 Y1
1  2_21 X1 2  2_35 Y2
2  2_34 X2 2  2_27 Y1
3  2_34 X2 2  2_35 Y2

Desired output/dataframe

    A   B  C  D    E
0  2_21 X1 2  2_27 Y1
1  2_34 X2 2  2_35 Y2

I have tried filtering the unwanted output/data frame by removing duplicates which was answered herebut this doesnt work as i need to split the columns up into subcolumns and remove duplicates of the subcolumns, not individual cells (which didnt seem practical after attempting).

I also cant simply remove the unwanted rows in the unwanted dataframe as there are many randomly occurring rows that need manual removal.

The pandas docs on merge are here but it doesnt seem to allow this.

Does anyone know of a possible way to achieve this? I really appreciate your help in advance as ive been stuck on this for a while. Thank you

score 0 · Answer 1 · answered Mar 17 '23 at 18:35

I you really want to merge all C's in df to all C's in df2 based on position, you need to create a join helper column like I did here with 'key':

df.assign(key=df.groupby('C').cumcount()).merge(df2.assign(key=df2.groupby('C').cumcount()))

Output:

      A   B  C  key     D   E
0  2_21  X1  2    0  2_27  Y1
1  2_34  X2  2    1  2_35  Y2

Then you can drop key column in results.

Of if you just want to concatenate the two dataframe side by side, then use pd.concat([df, df2], axis=1).

How to merge pandas data frames without getting duplicate rows

1 Answers1