How to fillna by groupby outputs in pandas?

Question

I have a dataframe having 4 columns(A,B,C,D). D has some NaN entries. I want to fill the NaN values by the average value of D having same value of A,B,C.

For example,if the value of A,B,C,D are x,y,z and Nan respectively,then I want the NaN value to be replaced by the average of D for the rows where the value of A,B,C are x,y,z respectively.

score 19 · Accepted Answer · answered Oct 15 '17 at 14:58

df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean')) would be faster than apply

In [2400]: df
Out[2400]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

In [2401]: df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
Out[2401]:
0    1.0
1    2.0
2    3.0
3    5.0
Name: D, dtype: float64

In [2402]: df['D'] = df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))

In [2403]: df
Out[2403]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0

Details

In [2396]: df.shape
Out[2396]: (10000, 4)

In [2398]: %timeit df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
100 loops, best of 3: 3.44 ms per loop


In [2397]: %timeit df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
100 loops, best of 3: 5.34 ms per loop

Hi @zero, I am trying to use your method for planets dataset, but it's not imputing all the values. Not sure why: https://stackoverflow.com/questions/73449902/fill-in-missing-values-with-groupby/73450241 — Roy, Aug 26 '22 at 18:58

score 9 · Answer 2 · answered Jan 16 '17 at 15:52

I think you need:

df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))

Sample:

df = pd.DataFrame({'A':[1,1,1,3],
                   'B':[1,1,1,3],
                   'C':[1,1,1,3],
                   'D':[1,np.nan,3,5]})

print (df)
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
print (df)
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0

score 2 · Answer 3 · edited May 23 '17 at 12:33

2

Link to duplicate of this question for further information: Pandas Dataframe: Replacing NaN with row average

Another suggested way of doing it mentioned in the link is using a simple fillna on the transpose: df.T.fillna(df.mean(axis=1)).T

edited May 23 '17 at 12:33

Community

1
1

answered Jan 16 '17 at 15:59

Fred Cascarini

62
1
6

How to fillna by groupby outputs in pandas?

3 Answers3

Linked

Related