I am trying to learn ML by solving the Titanic ML problem at kaggle and there are many Age
values that are missing. I am following a tutorial to solve the problem.
if I use
median_age = df.groupby('Title')['Age'].transform('median')
then it gives me something like
>>>median_age.sample(12)
PassengerId
1053 4.0
891 29.0
524 35.0
570 29.0
1236 4.0
463 29.0
359 22.0
403 22.0
604 29.0
791 29.0
512 29.0
1290 29.0
Name: Age, dtype: float64
These are the median ages
respective to each category of Title
My question is -
If I use df['Age'].fillna(median_age, inplace=True)
, How is it filling the values? What is the inner working of this last line?