Fastest way to merge Dataframe into one data frame

Question

I would like to merge two Dataframes into 1 fast as possible.

DF1:

    A   B   C
0  A0  B0  C0
1  A1  B1  C1
2  A2  B2  C2
3  A3  B3  C3
4  A4  B4  C4
5  A5  B5  C5
6  A6  B6  C6
7  A7  B7  C7

</pre>

DF2:

Current Results:

        A   B   C     Z
    0  A0  B0  C0   NaN
    1  A1  B1  C1   NaN
    2  A2  B2  C2   NaN
    3  A3  B3  C3  [Z4]
    4  A4  B4  C4  [Z4]
    5  A5  B5  C5  [Z5]
    6  A6  B6  C6  [Z6]
    7  A7  B7  C7  [Z7]

Required Results:

        A   B   C    Z
    0  A0  B0  C0  NaN
    1  A1  B1  C1  NaN
    2  A2  B2  C2  NaN
    3  A3  B3  C3  Z4
    4  A4  B4  C4  NaN
    5  A5  B5  C5  Z5
    6  A6  B6  C6  Z6
    7  A7  B7  C7  Z7

Three problems I have with my code. 1) I would not like the Z4 to repeat. if its null I would just like NaN.
2) I am not sure why but is coming as list when it should be string format.
3) very slow.

NOTE: Everything that is in DF2 dataframe column A will always be in D2 column A
My current Code

```
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'],
'B': ['B0', 'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7'],
'C': ['C0', 'C1', 'C2', 'C3','C4', 'C5', 'C6', 'C7']},
index=[0, 1, 2, 3, 4, 5, 6, 7])

df2 = pd.DataFrame({'A': ['A3', 'A5', 'A6', 'A7'],
'Z': ['Z4', 'Z5', 'Z6', 'Z7'],
},
index=[0, 1, 2, 3])

def mergeDF(df1, df2):
   import pandas as pd
   v = df1.merge(df2[['A', 'Z']])\
      .groupby(df1.columns.tolist())\
      .Z\
      .apply(pd.Series.tolist)

   df = pd.DataFrame(v.tolist(), index=v.index)\
      .rename(columns=lambda x: x + 1)\
      .add_prefix('Z')\
      .reset_index()

   df = df1.merge(v, how='left', on='A').ffill()

   print (df)


mergeDF(df1, df2)

print (df1)

```

Try `pd.concat([df1, df2], axis=1)` if you match on the index or join (inner) instead of merge if you match on a key column. — xletmjm, Feb 19 '20 at 23:17
That does not work. The creates 2 A columns and does't do matching across A. — mr_sp, Feb 19 '20 at 23:20
Because concat matches on the index only. Otherwise `merge` or `join` with `set_index` using the `on` argument as in the answers. — xletmjm, Feb 19 '20 at 23:25
The answer looks quite basic, have you not done any research? Why are you importing pandas multiple times, including inside of a function? — AMC, Feb 19 '20 at 23:39
Does this answer your question? [Merge two data frames based on common column values in Pandas](https://stackoverflow.com/questions/43297589/merge-two-data-frames-based-on-common-column-values-in-pandas) — AMC, Feb 19 '20 at 23:41

score 1 · Answer 1 · answered Feb 19 '20 at 23:20

1

You can do this in a much simpler way that what you've done now:

 pd.merge(df1,df2,how='left',on='A')

You've got a list because you converted v to list in your function and added ffill which will replace null values by the last non null value in the column

answered Feb 19 '20 at 23:20

fmarm

4,209
1
17
29

Wow that worked. Thanks. I put way too much time into this than required. Thanks so much. – mr_sp Feb 19 '20 at 23:24

score 1 · Accepted Answer · answered Feb 19 '20 at 23:20

1

Quick one:

>>>df1.join(df2.set_index('A'), how='outer', on='A')`
    A   B   C   Z
0   A0  B0  C0  NaN
1   A1  B1  C1  NaN
2   A2  B2  C2  NaN
3   A3  B3  C3  Z4
4   A4  B4  C4  NaN
5   A5  B5  C5  Z5
6   A6  B6  C6  Z6
7   A7  B7  C7  Z7

answered Feb 19 '20 at 23:20

xletmjm

257
1
3

Fastest way to merge Dataframe into one data frame

2 Answers2