I am trying to do a simple intersection of dataframes, but no matter the method (pd.merge
or pd.concat
) I always endup with duplicated columns.
Say the two dataframes are:
df1 = pd.DataFrame(
data = np.stack([
np.arange(4),
np.random.rand(4),
np.random.rand(4),
],axis=-1),
columns = ["A","B","C"]
)
df2 = pd.DataFrame(
data = np.stack([
np.arange(4)+2,
np.random.rand(4),
np.random.rand(4),
],axis=-1),
columns = ["A","B","C"]
)
df1,df2
( A B C
0 0.0 0.165849 0.516438
1 1.0 0.116661 0.395433
2 2.0 0.758229 0.386642
3 3.0 0.057719 0.802317,
A B C
0 2.0 0.899924 0.545800
1 3.0 0.792242 0.187410
2 4.0 0.332168 0.778160
3 5.0 0.164429 0.509884)
The result I want to achieve is this:
Source A B C
2 df1 2.0 0.758229 0.386642
3 df1 3.0 0.057719 0.802317
0 df2 2.0 0.899924 0.545800
1 df2 3.0 0.792242 0.187410
I tried with pd.concat, but I have duplicated columns:
# Try with pd.concat inner
for df in [df1,df2]:
df.set_index("A",inplace=True)
df = pd.concat([df1,df2],join="inner",axis=1).reset_index()
df
A B C B C
0 2.0 0.758229 0.386642 0.899924 0.54580
1 3.0 0.057719 0.802317 0.792242 0.18741
And pd.merges also gives duplicated columns:
df = pd.merge(
left = df1,
right = df2,
how="inner",
on = "A"
)
df
A B_x C_x B_y C_y
0 2.0 0.758229 0.386642 0.899924 0.54580
1 3.0 0.057719 0.802317 0.792242 0.18741
Anyone has a clean way to do this in pandas ?
Thanks