3

I have dataframe like as below

Re_MC,Fi_MC,Fin_id,Res_id,     
1,2,3,4
,7,6,11
11,,31,32
,,35,38

df1 = pd.read_clipboard(sep=',')

I would like to fillna based on two steps

a) First, compare only Re_MC and Fi_MC. If a value is missing in either of these columns, copy it from the other column.

b) Despite doing step a, if there is still NA for either Re_MC or Fi_MC, copy values from Fin_id for Fi_MC and Res_id for Re_MC.

So, I tried the below two approaches

Approach 1 - This works but not efficient/elegant

df1['Re_MC'] = df1['Re_MC'].fillna(df1['Fi_MC'])
df1['Fi_MC'] = df1['Fi_MC'].fillna(df1['Re_MC'])
df1['Re_MC'] = df1['Re_MC'].fillna(df1['Res_id'])
df1['Fi_MC'] = df1['Fi_MC'].fillna(df1['Fin_id'])

Approach 2 - This doesn't work and provide incorrect output

df1['Re_MC'] = df1['Re_MC'].fillna(df1['Fi_MC']).fillna(df1['Res_id'])
df1['Fi_MC'] = df1['Fi_MC'].fillna(df1['Re_MC']).fillna(df1['Fin_id'])

Is there any other efficient way to fillna in a sequential manner? Meaning, we do step a first and then based on result of step a, we do step b

I expect my output to be like as shown below

enter image description here

updated code

df_new = (df_new 
 .fillna({'Re MC': df_new['Re Cust'],'Re MC': df_new['Re Cust_System']})
 .fillna({'Fi MC' : df_new['Fi.Fi Customer'],'Final MC':df_new['Re.Fi Customer']})
 .fillna({'Fi MC' : df_new['Re MC']})
 .fillna({'Class Fi MC':df_new['Re MC']})
)
The Great
  • 7,215
  • 7
  • 40
  • 128

1 Answers1

3

You can use dictionaries in fillna:

(df1
 .fillna({'Re_MC': df1['Fi_MC'], 'Fi_MC': df1['Re_MC']})
 .fillna({'Re_MC': df1['Res_id'], 'Fi_MC': df1['Fin_id']})
)

output:

   Re_MC  Fi_MC  Fin_id  Res_id
0    1.0    2.0       3       4
1    7.0    7.0       6      11
2   11.0   11.0      31      32
3   38.0   35.0      35      38
mozway
  • 194,879
  • 13
  • 39
  • 75
  • but how does it do in order? is that how dict keys work? – The Great May 20 '22 at 07:40
  • Yes dictionaries are ordered in python since 3.6, so the operations are applied in the order of definition ;) – mozway May 20 '22 at 07:41
  • 1
    @TheGreat I know, it's no blame from my side. But commenting just to say it is unnecessary and is seen as pollution as eventually someone will flag the comment as unnecessary and this gives mods more work to check and delete them. But thanks for you appreciation anyways ;) – mozway May 20 '22 at 07:46
  • one minor weird issue. When I run the above code, it doesn't fill all NA's in those column at once. So, when I run the jupyter notebook cell again for 2nd time, it fills all NA's. Is there any reason why all NA's for respective columns are not filled at once (instead we have to execute twice)? – The Great May 22 '22 at 06:15
  • you can see the updated code at the botton of the post to see what I am doing. Is it because of duplicate keys? – The Great May 22 '22 at 06:22
  • I think it's not unexpected and inherent to the logic. As you fill data the columns change, so when you run the code again you get a new iteration. Example you have 3 cols A,B,C you fill A with B, B with A, B with C. If you have NaNs in a row of both A and B (but not C), you won't fill A until the second iteration. – mozway May 22 '22 at 07:14