how to fill NA with mean only for 2 or less consequective values of NA

Question

I am new to python. please help me how I should proceed. The following dataframe contains large blocks of NaNs. # Fill the NAs with mean only for 2 or less consecutive values of NAs. # Refer to the documentation of fillna() to find out the parameter you would use to fill only a certail number of NAs. # The resulting dataframe should look like df_filled

# The resulting dataframe should look like df_filled shown below.

df = pd.DataFrame({'val1':[4,np.nan,7,np.nan,np.nan,9,5, np.nan , 1,9,np.nan, np.nan,np.nan, 5, np.nan], 
                    'val2': [ np.nan, 5,7,np.nan, np.nan,8,3,np.nan, 4,np.nan, np.nan, np.nan,np.nan,21,np.nan]})

d = {'val1': {0: 4.0,1: 5.7142857142857144,2: 7.0,3: 5.7142857142857144,4: np.nan,5: 9.0,6: 5.0,7: np.nan,8: 1.0,9: 9.0,10: np.nan,11: np.nan,12: np.nan,13: 5.0,14: np.nan},
'val2': {0: 8.0,1: 5.0,2: 7.0,3: 8.0,4: np.nan,5: 8.0,6: 3.0,7: np.nan,8: 4.0,9: np.nan,10: np.nan,11: np.nan,12: np.nan,13: 21.0,14: np.nan}}

df_filled = pd.DataFrame(d)

Did you experience any difficulties with this part: `Refer to the documentation of fillna() to find out the parameter you would use to fill only a certail number of NAs.`? — MaxU - stand with Ukraine, Jul 28 '17 at 12:10

score 0 · Answer 1 · answered Jul 28 '17 at 12:58

You should consider loop over each dict series value and store information of sum of all values different than NA, count of elements different than NA and array of indices with less or equal 2 consecutives NA values.

Example:

'val1':[4,np.nan,7,np.nan,np.nan,9,5, np.nan , 1,9,np.nan, np.nan,np.nan, 5, np.nan]

 sum = 40,
 count = 7,
 array_na = [1, 3, 4, 7, 10, 11, 14]

In my logic case, 12 indice won't be filled with mean since it's a third np.nan value. Also, I don't think this is the logic that you mentioned since the description it's quite confusing and result seems to be wrong:

{'val1': {0: 4.0,1: 5.7142857142857144,2: 7.0,3: 5.7142857142857144,4: np.nan,5: 9.0,6: 5.0,7: np.nan,8: 1.0,9: 9.0,10: np.nan,11: np.nan,12: np.nan,13: 5.0,14: np.nan}

score 0 · Accepted Answer · answered Jul 28 '17 at 13:05

0

Let's try this

df["val1"] = df["val1"].transform(lambda x: x.fillna(x.mean(), limit=2))
df["val2"] = df["val2"].transform(lambda x: x.fillna(x.mean(), limit=2))
print df

Don't forget to let us know if it solved your problem :)

answered Jul 28 '17 at 13:05

Prem

11,775
1
19
33

thanks I got the answer using df.fillna(df.mean(),limit=2) – Vijay Jul 30 '17 at 06:20

how to fill NA with mean only for 2 or less consequective values of NA

2 Answers2

Linked