0

I have a dataset as follow -

alldata.loc[:,["Age","Pclass"]].head(10)
Out[24]: 
    Age  Pclass
0  22.0       3
1  38.0       1
2  26.0       3
3  35.0       1
4  35.0       3
5   NaN       3
6  54.0       1
7   2.0       3
8  27.0       3
9  14.0       2

Now I want to fill all the null values in Age with the mean of all the Age values for that respective Pclass type.

Example - In the above snippet for null value of Age for Pclass = 3, it takes mean of all the age belonging to Pclass = 3. Therefore replacing null value of Age = 22.4.

I tried some solutions using groupby, but it made changes only to a specific Pclass value and converted rest of the fields to null. How to achieve 0 null values in this case.

Sheldore
  • 37,862
  • 7
  • 57
  • 71
Akash Tyagi
  • 97
  • 2
  • 15

1 Answers1

3

You can use

1] transform and lambda function

In [41]: df.groupby('Pclass')['Age'].transform(lambda x: x.fillna(x.mean()))
Out[41]:
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
5    22.4
6    54.0
7     2.0
8    27.0
9    14.0
Name: Age, dtype: float64

Or use

2] fillna over mean

In [46]: df['Age'].fillna(df.groupby('Pclass')['Age'].transform('mean'))
Out[46]:
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
5    22.4
6    54.0
7     2.0
8    27.0
9    14.0
Name: Age, dtype: float64

Or use

3] loc to replace null values

In [47]: df.loc[df['Age'].isnull(), 'Age'] = df.groupby('Pclass')['Age'].transform('mean')

In [48]: df
Out[48]:
    Age  Pclass
0  22.0       3
1  38.0       1
2  26.0       3
3  35.0       1
4  35.0       3
5  22.4       3
6  54.0       1
7   2.0       3
8  27.0       3
9  14.0       2
Zero
  • 74,117
  • 18
  • 147
  • 154