0

I am trying to learn ML by solving the Titanic ML problem at kaggle and there are many Age values that are missing. I am following a tutorial to solve the problem.

if I use

median_age = df.groupby('Title')['Age'].transform('median')

then it gives me something like

>>>median_age.sample(12)

PassengerId
1053     4.0
891     29.0
524     35.0
570     29.0
1236     4.0
463     29.0
359     22.0
403     22.0
604     29.0
791     29.0
512     29.0
1290    29.0
Name: Age, dtype: float64

These are the median ages respective to each category of Title

My question is -

If I use df['Age'].fillna(median_age, inplace=True) , How is it filling the values? What is the inner working of this last line?

Deshwal
  • 3,436
  • 4
  • 35
  • 94
  • `df['Age'].fillna(median_age, inplace=True)` finds median for whole `Age` column and put this value into NaNs in this column. – Quant Christo Oct 28 '19 at 13:14
  • How does it know the ```Age``` category and what to fill in each ```NaN``` as ```median_age```is just a ```Series```? – Deshwal Oct 28 '19 at 13:17
  • 2
    @Deshwal - it woriking nice because same index of `df` and `median_age`, also same length, so matching nice. – jezrael Oct 28 '19 at 13:18
  • okay so it goes to some index where there are ```NaN``` and it maps using the ```PassangerID```? Means whatever value of ```Age``` is presented there, it just copies the value in place of ```NaN```? – Deshwal Oct 28 '19 at 13:20
  • yes, exactly. it just copies the value in place of NaN. – jezrael Oct 28 '19 at 13:24

0 Answers0