Python pandas fillna only one row with specific value

Question

EDITED:

I have (not a very simple) a dataframe:

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])

     att1
0  1.0000
1  2.0000
2     nan
3     nan
4     nan
5     nan
6  3.0000
7  4.0000
8     nan
9     nan
10    nan
11 5.0000

I want fill NAN values with the previous not NAN value except the last NAN value. I want the last NAN value to be NAN after filling. How can I do that?

I want this result:

I tried this:

df = df.fillna(value='missing', method='bfill', limit=1)
df = df.fillna(method='ffill')

But the first row gives this error:

ValueError: cannot specify both a fill method and value

Why there is this limitation in pandas 0.17.1 / Python 3.5? Thank you!

IIUC then doesn't this do what you want: `df.fillna(value='missing', limit=3)`? — EdChum, Feb 03 '16 at 13:22
Thank you! I edited my question because first I used a simple `df` but I have several `NAN` blocks in my dataframe so I don't know the `forward` `limit`, only the `backward` `limit` that is `1`. — ragesz, Feb 03 '16 at 13:42

jezrael · Accepted Answer · 2016-02-03T14:12:01.763

You can count NaN in df['att1'], substract 1 and then it use as parameter limits to fillna:

import pandas as pd
import numpy as np

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3] , columns=['att1'])
print df
   att1
0     1
1     2
2   NaN
3   NaN
4   NaN
5   NaN
6     3

s = df['att1'].isnull().sum() - 1
df['att1'] = df['att1'].fillna('missing', limit=s)
print df
      att1
0        1
1        2
2  missing
3  missing
4  missing
5      NaN
6        3

EDIT:

Now it is more complicated.

So first set helper column count for counting consecutives values of column att1 by isnull, shift, astype and cumsum. Then groupby by this column count and fillna:

import pandas as pd
import numpy as np

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])
print df

df['count'] = (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int).cumsum()
print df
    att1  count
0      1      1
1      2      1
2    NaN      2
3    NaN      2
4    NaN      2
5    NaN      2
6      3      3
7      4      3
8    NaN      4
9    NaN      4
10   NaN      4
11     5      5

def f(x):
    att = x['att1'].isnull()
    if(att.all()):
        return x['att1'].fillna('missing', limit=att.sum() - 1)
    else:
        return x['att1']

print df.groupby(['count']).apply(f).reset_index(drop=True)

0           1
1           2
2     missing
3     missing
4     missing
5         NaN
6           3
7           4
8     missing
9     missing
10        NaN
11          5
Name: att1, dtype: object

Explaining column count:

print (df['att1'].isnull() != df['att1'].isnull().shift())
0      True
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8      True
9     False
10    False
11     True
Name: att1, dtype: bool

print (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int)
0     1
1     0
2     1
3     0
4     0
5     0
6     1
7     0
8     1
9     0
10    0
11    1
Name: att1, dtype: int32

print (df['att1'].isnull() != df['att1'].isnull().shift()).astype(int).cumsum()
0     1
1     1
2     2
3     2
4     2
5     2
6     3
7     3
8     4
9     4
10    4
11    5
Name: att1, dtype: int32

Thank you! I used a simple `df` example but of course I have several blocks of `NAN` values in my dataframe so when I calculate the `s` for `limit`, this solution will fail :( — ragesz, Feb 03 '16 at 13:25

score 1 · Answer 2 · answered Feb 03 '16 at 14:27

An alternative method that is maybe a little less complex would just to be to create a list of index points at which you expect there to be NaNs (where the index point is not null, but the index point before is null). Then you just forward fill your data and reinsert the NaNs using the list you created.

import pandas as pd
import numpy as np
from numpy import nan as NA
df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3, 4
    , np.nan, np.nan, np.nan, 5], columns=['att1'])

#create list of index points where you want NaNs to be be
Nan_ind = [x - 1 for x in xrange(1, df.index[-1] + 1) 
                if pd.notnull(df.loc[x, 'att1'])
                and pd.isnull(df.loc[x-1, 'att1'])]

#forward fillna             
df['att1'] = df['att1'].fillna(method = 'ffill')

#reinsert NaNs using your list of index points
df.loc[Nan_ind, 'att1'] = NA

Thank you! It works perfectly as well!! In Python 3 `range()` can be used — ragesz, Feb 03 '16 at 14:43

score 0 · Answer 3 · answered Jul 08 '18 at 20:27

fillna all NaN with "missing". The last "missing" you can replace with NaN.

df['att1'].fillna("missing",inplace=True)
df.iloc[[-2]].replace("missing",NaN)

using negative value for iloc search index backwards. -2 return the value of the forelast element of the 'att1' column.

Python pandas fillna only one row with specific value

3 Answers3