1

I have two dataframes with unique ids. They share some columns but not all. I need to create a combined dataframe which will include rows from missing ids from the second dataframe. Tried merge and concat, no luck. It's probably too late, my brain stopped working. Will appreciate your help!

   df1 = pd.DataFrame({
        'id': ['a','b','c','d','f','g','h','j','k','l','m'],
        'metric1': [123,22,356,412,54,634,72,812,129,110,200],
        'metric2':[1,2,3,4,5,6,7,8,9,10,11]
        })


df2 = pd.DataFrame({
    'id': ['a','b','c','d','f','g','h','q','z','w'],
    'metric1': [123,22,356,412,54,634,72,812,129,110]
    })
df2

The result should look like this:

    id  metric1 metric2
0   a   123     1.0
1   b   22      2.0
2   c   356     3.0
3   d   412     4.0
4   f   54      5.0
5   g   634     6.0
6   h   72      7.0
7   j   812     8.0
8   k   129     9.0
9   l   110     10.0
10  m   200     11.0
11  q   812     NaN
12  z   129     NaN
13  w   110     NaN
aviss
  • 2,179
  • 7
  • 29
  • 52

1 Answers1

6

In this case using combine_first

df1.set_index('id').combine_first(df2.set_index('id')).reset_index()
Out[766]: 
   id  metric1  metric2
0   a    123.0      1.0
1   b     22.0      2.0
2   c    356.0      3.0
3   d    412.0      4.0
4   f     54.0      5.0
5   g    634.0      6.0
6   h     72.0      7.0
7   j    812.0      8.0
8   k    129.0      9.0
9   l    110.0     10.0
10  m    200.0     11.0
11  q    812.0      NaN
12  w    110.0      NaN
13  z    129.0      NaN
BENY
  • 317,841
  • 20
  • 164
  • 234