how to merge this two pandas dataframes to one in python

Question

import pandas as pd
data1 = [[2000,'size56'],[2001,'size56'],[2002,'size09']]
columns1=['a','b']
df1 = pd.DataFrame(data=data1,columns=columns1)
print(df1)

data2 = [['3000','size56'],['3001','size56'],['3002','size56'],['3003','size09']]
columns2=['a','b']
df2 = pd.DataFrame(data=data2, columns=columns2)


df1['g'] = df1.groupby(['b']).cumcount()
df2['g'] = df2.groupby(['b']).cumcount()

df = pd.merge(df1, df2, on=['b', 'g'] , how='inner')
print(df)

df1['g'] = df1.groupby(['b']).cumcount() df2['g'] = df2.groupby(['b']).cumcount()

df = pd.merge(df1, df2, on=['b', 'g'] , how='inner') print(df)

import pandas as pd
data1 = [[2000,'size56'],[2001,'size56'],[2002,'size09']]   
columns1=['a','b']
df1 = pd.DataFrame(data=data1,columns=columns1)
print(df1)

data2 = [['3000','size56'],['3001','size56'],['3002','size56'],['3003','size09']]   
columns2=['A','B']
df2 = pd.DataFrame(data=data2, columns=columns2)

I want to merge the following two df1 and df2 in pandas (python)

df1:

   a     b
0  2000  size56
1  2001  size56
2  2002  size09

df2:

   A   B
0  3000  size56
1  3001  size56
2  3002  size56
3  3003  size09

I hope to get this result:

df3:

    a     b       A     B
0  2000  size56  3000  size56   
1  2001  size56  3001  size56   
2  2002  size09  3003  size09

Any suggestion is highly appreciated!

Jason Cai

please use the following code to get the answer. Thanks a lot!

df1['g'] = df1.groupby(['b']).cumcount() df2['g'] = df2.groupby(['b']).cumcount()

Does this answer your question? [Concatenate rows of two dataframes in pandas](https://stackoverflow.com/questions/28135436/concatenate-rows-of-two-dataframes-in-pandas) — Panagiotis Simakis, May 20 '20 at 13:01
Btw. you can use `df.to_clipboard()` to copy your DataFrame for posting it on Stackoverflow. — Milo, May 20 '20 at 13:05
Pure code-writing requests are off-topic on Stack Overflow — we expect questions here to relate to *specific* programming problems — but we will happily help you write it yourself! Tell us [what you've tried](https://stackoverflow.com/help/how-to-ask), and where you are stuck. This will also help us answer your question better. — rizerphe, May 20 '20 at 13:06
@jezrael: are you sure it is a true duplicate? For index 2 row contains 3003 in A because it is the first row with B=size09... — Serge Ballesta, May 20 '20 at 13:21
@CaiJason: I am still unsure that your question is a duplicate. Unfortunately, it is not precise enough for me to be sure. If you think that the duplicate does not answer your problem, you should edit the question explaining what is your exact requirement and why the proposed duplicate does not answer it. Then ping me in a comment and I will re-examine it. — Serge Ballesta, May 20 '20 at 13:48
Let me explain what I want to do. In df1, column_a represents the address of a disk such as address 2000, and column_b represents the capacity of address 2000 disk such as size56. In df2, column_A and column_B are the same as column_a and column_b. I plan to copy the contents of the disk in df1 to the disk in df2, as long as the two sides have the same capacity. For example, the disk at address 2000 is copied to the disk at address 3000 because the capacity of the disks on both sides is the same. I need to create df3 to establish the relationship between the disks on both sides. — Cai Jason, May 20 '20 at 14:29

Sajan · Accepted Answer · 2020-05-21T13:59:43.350

1

You could try this ( this was inspired by @jezrael post - pandas merge df many to many without duplicates )

import pandas as pd
df1['c'] = df1.groupby(['b']).cumcount()
df2['c'] = df2.groupby(['B']).cumcount()
df = pd.merge(df1,df2,left_on=['b','c'], right_on=['B','c'])
df
    a       b    c     A       B
0  2000  size56  0   3000   size56
1  2001  size56  1   3001   size56
2  2002  size09  0   3003   size09

edited May 21 '20 at 13:59

answered May 20 '20 at 13:10

Sajan

1,247
1
5
13

out[2] a b A B 0 2000 size56 3000 size56 1 2000 size56 3001 size56 2 2000 size56 3002 size56 3 2001 size56 3000 size56 4 2001 size56 3001 size56 5 2001 size56 3002 size56 6 2002 size09 3003 size09 it isn't my need. – Cai Jason May 20 '20 at 14:40
Please check edited answer. – Sajan May 21 '20 at 14:00

score 0 · Answer 2 · answered May 20 '20 at 13:15

0

Since you're trying to relate on both dataframe's index, you can simply use join function. There's a very good section from the Pandas doc on Merge, join, and concatenate which explains various ways to relate multiple dataframes together. For your example, refer to below.

d1 = [
    {'a': 2000, 'b': 'size56'},
    {'a': 2001, 'b': 'size56'},
    {'a': 2002, 'b': 'size09'},
]
d2 = [
    {'A': 3000, 'B': 'size56'},
    {'A': 3001, 'B': 'size56'},
    {'A': 3002, 'B': 'size56'},
    {'A': 3003, 'B': 'size09'},
]
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

df1.join(df2)

Out[2]:
      a       b     A       B
0  2000  size56  3000  size56
1  2001  size56  3001  size56
2  2002  size09  3002  size56

answered May 20 '20 at 13:15

Tommy

346
1
5

out[2]:2 2002 size09 3002 size56 I hope to get 2002 size09 3003 size09 – Cai Jason May 20 '20 at 14:38
Please edit your question to show the actual output you want since obviously you want a different output than what you have provided. Also please provide more information on how you want to relate the two dataframes. – Tommy May 21 '20 at 00:19
df1['g'] = df1.groupby(['b']).cumcount() df2['g'] = df2.groupby(['b']).cumcount() df = pd.merge(df1, df2, on=['b', 'g'] , how='inner') I got the answer,please see it above. Thanks a lot! – Cai Jason May 21 '20 at 02:16

how to merge this two pandas dataframes to one in python

2 Answers2