4

I have two dataframes:

DF ONE:

ID     A    B    C
 1     x    y    z
 1     x    y    z
 2     x    y    z
 2     x    y    z
 2     x    y    z
 3     x    y    z

DF TWO:

ID     D    E    F
 1     a    b    c1
 2     a    b    c2
 3     a    b    c3

I want to take column E for example from DF TWO, and put it on DF ONE, if the ID is the same, so after I will get this output:

ID     A    B    C    F
 1     x    y    z    c1
 1     x    y    z    c1
 2     x    y    z    c2
 2     x    y    z    c2
 2     x    y    z    c2
 3     x    y    z    c3

thank you for your help

TheDaJon
  • 545
  • 2
  • 8
  • 24
  • You might see my question insightful to your problem: http://stackoverflow.com/questions/43311266/merging-two-dataframe-on-column-and-index – Alex T Apr 19 '17 at 09:40

3 Answers3

5

You can use map by dict:

d = df2.set_index('ID')['F'].to_dict()
print (d)
{1: 'c1', 2: 'c2', 3: 'c3'}

df1['F'] = df1['ID'].map(d)
print (df1)
   ID  A  B  C   F
0   1  x  y  z  c1
1   1  x  y  z  c1
2   2  x  y  z  c2
3   2  x  y  z  c2
4   2  x  y  z  c2
5   3  x  y  z  c3

Another solution is map by Series:

s = df2.set_index('ID')['F']
print (s)
ID
1    c1
2    c2
3    c3
Name: F, dtype: object

df1['F'] = df1['ID'].map(s)
print (df1)
   ID  A  B  C   F
0   1  x  y  z  c1
1   1  x  y  z  c1
2   2  x  y  z  c2
3   2  x  y  z  c2
4   2  x  y  z  c2
5   3  x  y  z  c3

Timings:

#[60000 rows x 5 columns]
df1 = pd.concat([df1]*10000).reset_index(drop=True)

In [115]: %timeit pd.merge(df1, df2[['ID', 'F']],how='left')
100 loops, best of 3: 11.1 ms per loop

In [116]: %timeit df1['ID'].map(df2.set_index('ID')['F'])
100 loops, best of 3: 3.18 ms per loop

In [117]: %timeit df1['ID'].map(df2.set_index('ID')['F'].to_dict())
100 loops, best of 3: 3.36 ms per loop

In [118]: %timeit df1['ID'].map({k:v for k, v in df2[['ID', 'F']].as_matrix()})
100 loops, best of 3: 3.44 ms per loop

In [119]: %%timeit 
     ...: df2.index = df2['ID']
     ...: df1['F1'] = df1['ID'].map(df2['F'])
     ...: 
100 loops, best of 3: 3.33 ms per loop
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

You need to create maps from df2 and you can do it like this:

mapping = {k:v for k, v in df2[['ID', 'F']].as_matrix()}

Then just apply them to df1:

df1['F'] = df1['ID'].map(mapping)

Or you can use:

df1 = pd.merge(df1, df2[['ID', 'F']],how='left')
zipa
  • 27,316
  • 6
  • 40
  • 58
1

You can use map, taking care of setting ID as index of your dataframe TWO with df2.index = df2['ID']:

In [10]: df1['F'] = df1['ID'].map(df2['F'])

In [11]: df1
Out[11]: 
   ID  A  B  C   F
0   1  x  y  z  c1
1   1  x  y  z  c1
2   2  x  y  z  c2
3   2  x  y  z  c2
4   2  x  y  z  c2
5   3  x  y  z  c3
Fabio Lamanna
  • 20,504
  • 24
  • 90
  • 122
  • not working. maybe it is because `ID` column in my dataframes isn't an index column? – TheDaJon Apr 19 '17 at 09:59
  • Yes, you should set it as index in your second dataframe. If you're importing from a .csv file try `pd.read_csv(filename.csv,index_col='ID')`, otherwise set it with `df2.index = df2['ID']`. – Fabio Lamanna Apr 19 '17 at 10:38