3

I want to fillna of df1, using df2, based on same colorID while keeping all rows and columns of df1.

df1=

colorID age  flower        
red1     12    sun
red2    na    sun
green   23    hydro
red3    na    hydro
yellow  3     sun
red4    na    hydro

df2=

colorID age
red2     4
blue     5
red3     6 
red4     7 

desired df3 =

colorID age  flower        
red1     12    sun
red2     4    sun
green   23    hydro
red3     6    hydro
yellow   3     sun
red4     7    hydro

Tried set_index()

df1.set_index("colorID").age.fillna(df2.set_index("colorID").age).reset_index()

but only colorID and age are the outputs.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Kevin
  • 31
  • 3

2 Answers2

2

You can pass a DataFrame to the fillna function after setting the index of both the DataFrames to the common field - colorID

df1 = df1.set_index('colorID')
df2 = df2.set_index('colorID')
df1 = df1.fillna(df2)
#        age flower
#colorID           
#red1     12    sun
#red2      4    sun
#green    23  hydro
#red3      6  hydro
#yellow    3    sun
#red4      7  hydro
Mortz
  • 4,654
  • 1
  • 19
  • 35
  • Thank you Mortz, but don't I need to specify the columns with na? df1 = df1['age'].fillna(df2['age]) However, only ID and age are the outputs again. – Kevin Feb 17 '22 at 18:20
  • If you don't specify the columns then all columns that are common between the `df1` and `df2` will be filled in. When you do `df1=df1['age'].fillna(df2['age'])` you are changing `df1` itself by assigning it to `df1['age']` - in other words, your df1 now only contains the age column. – Mortz Feb 18 '22 at 08:06
0

Another option is to specify the column that you want to use to fill the n/a values, keeping the rest of the Dataframe intact;

df_1['age'] = df_1['age'].fillna(df_2['age'])

Keep in mind that both Dataframes should share the same IDs to know where to look/replace the n/a data.


More examples here;

https://datascienceparichay.com/article/pandas-fillna-with-values-from-another-column/

mayo
  • 3,845
  • 1
  • 32
  • 42