-2

I am new to python. please help me how I should proceed. The following dataframe contains large blocks of NaNs. # Fill the NAs with mean only for 2 or less consecutive values of NAs. # Refer to the documentation of fillna() to find out the parameter you would use to fill only a certail number of NAs. # The resulting dataframe should look like df_filled

# The resulting dataframe should look like df_filled shown below.

df = pd.DataFrame({'val1':[4,np.nan,7,np.nan,np.nan,9,5, np.nan , 1,9,np.nan, np.nan,np.nan, 5, np.nan], 
                    'val2': [ np.nan, 5,7,np.nan, np.nan,8,3,np.nan, 4,np.nan, np.nan, np.nan,np.nan,21,np.nan]})

d = {'val1': {0: 4.0,1: 5.7142857142857144,2: 7.0,3: 5.7142857142857144,4: np.nan,5: 9.0,6: 5.0,7: np.nan,8: 1.0,9: 9.0,10: np.nan,11: np.nan,12: np.nan,13: 5.0,14: np.nan},
'val2': {0: 8.0,1: 5.0,2: 7.0,3: 8.0,4: np.nan,5: 8.0,6: 3.0,7: np.nan,8: 4.0,9: np.nan,10: np.nan,11: np.nan,12: np.nan,13: 21.0,14: np.nan}}

df_filled = pd.DataFrame(d)
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48
Vijay
  • 1
  • 4
  • 2
    Did you experience any difficulties with this part: `Refer to the documentation of fillna() to find out the parameter you would use to fill only a certail number of NAs.`? – MaxU - stand with Ukraine Jul 28 '17 at 12:10

2 Answers2

0

You should consider loop over each dict series value and store information of sum of all values different than NA, count of elements different than NA and array of indices with less or equal 2 consecutives NA values.

Example:

'val1':[4,np.nan,7,np.nan,np.nan,9,5, np.nan , 1,9,np.nan, np.nan,np.nan, 5, np.nan]

 sum = 40,
 count = 7,
 array_na = [1, 3, 4, 7, 10, 11, 14]

In my logic case, 12 indice won't be filled with mean since it's a third np.nan value. Also, I don't think this is the logic that you mentioned since the description it's quite confusing and result seems to be wrong:

{'val1': {0: 4.0,1: 5.7142857142857144,2: 7.0,3: 5.7142857142857144,4: np.nan,5: 9.0,6: 5.0,7: np.nan,8: 1.0,9: 9.0,10: np.nan,11: np.nan,12: np.nan,13: 5.0,14: np.nan}
0

Let's try this

df["val1"] = df["val1"].transform(lambda x: x.fillna(x.mean(), limit=2))
df["val2"] = df["val2"].transform(lambda x: x.fillna(x.mean(), limit=2))
print df


Don't forget to let us know if it solved your problem :)

Prem
  • 11,775
  • 1
  • 19
  • 33