Select rows from a DataFrame based on a values in another dataframe and updating one of the column with values according to the second DataFrame

Question

I have two Dataframes df and df1.

Main DataFrame is as follows:
DF:

    start   end price
0   A   Z   1
1   B   Y   2
2   C   X   3
3   A   Z   4
4   D   W   5

Second DataFrame:
DF1:

start   end price
    0   A   Z   100
    1   B   Y   200

I want the main dataframe df to update the values in 'price' columns based on the start and end in df1. it should update column value for all the rows having the same start and end as in df1. DF:

start   end price
0   A   Z   100
1   B   Y   200
2   C   X   3
3   A   Z   100
4   D   W   5

(all A-Z and B-Y in df should get updated). Is there anyway I can get this output ? In reality the datframes have more columns but I want to update only one column(eg.'Price').

score 2 · Answer 1 · answered Sep 21 '18 at 02:02

2

First, you can merge:

s = df1.merge(df2, left_on=['start', 'end'], right_on=['start', 'end'], how='left')

Then you can fillna and index your desired columns:

s.assign(price=s.price_y.fillna(s.price_x))[['start', 'end', 'price']]

  start end  price
0     A   Z  100.0
1     B   Y  200.0
2     C   X    3.0
3     A   Z  100.0
4     D   W    5.0

answered Sep 21 '18 at 02:02

user3483203

50,081
9
65
94

thaks. I have multiple columns in the main dataframe df (except start and end), I want that too in the result. how can I modify your code to get all columns from the main dataframe df. – AISH Sep 21 '18 at 06:51

score 2 · Answer 2 · answered Sep 21 '18 at 02:07

2

Using update

df=df.set_index(['start','end'])
df.update(df1.set_index(['start','end']))
df.reset_index()
Out[99]: 
  start end  price
0     A   Z  100.0
1     B   Y  200.0
2     C   X    3.0
3     A   Z  100.0
4     D   W    5.0

answered Sep 21 '18 at 02:07

BENY

317,841
20
164
234

score 1 · Answer 3 · answered Sep 21 '18 at 05:23

1

`merge`

df.drop('price', 1).merge(df1, 'left').fillna(df)

  start end  price
0     A   Z  100.0
1     B   Y  200.0
2     C   X    3.0
3     A   Z  100.0
4     D   W    5.0

I'm going to merge on ['start', 'end'] and that pesky price is going to get in my way. So, I drop it.
I need to preserve df index because I have that repeat of 'A' and 'Z'. So, I use a 'left' merge
Now my missing elements can be filled back in with df

answered Sep 21 '18 at 05:23

piRSquared

285,575
57
475
624

1

Very nice, much cleaner – user3483203 Sep 21 '18 at 05:37
It gives me the exact output, but when I run it in a loop( in my code df1 keeps changing and makes updates to the mail dataframe df ), it gave me error saying " TypeError: unhashable type: 'dict' ". – AISH Sep 21 '18 at 07:16
Loop? What? Why? I have no idea what you are doing in your loop. I’d suggest accepting one of these answers and asking a new question that addresses the loop issue. – piRSquared Sep 21 '18 at 10:34

Select rows from a DataFrame based on a values in another dataframe and updating one of the column with values according to the second DataFrame

3 Answers3

merge

`merge`