-1

I have a dataframe that looks like this:

df = pd.DataFrame({'animals': [None, 'cat', 'dog', None, 'hippo', 'elephant']}) 

The column animals has two None values.

I want to replace the first missing value with one value and the second missing value with another value.

The code I have so far replaces only for the first missing value. The second missing value is not updated.

new_df = df.animals.fillna(pd.Series(['unknown1', 'unknown2'])
new_df

0    unknown1
1         cat
2         dog
3         NaN
4       hippo
5    elephant
Name: animals, dtype: object

I expected that value for index 3 to be equal to unknown2.

How can I get this to work so that I can replace the missing values in a given column with a pandas series of missing values with a length equal to the number of missing values in that column?

cs95
  • 379,657
  • 97
  • 704
  • 746

2 Answers2

0

Use fillna with a Series of same length:

s = pd.Series(['unknown1', 'unknown2'])
df['animals'] = df['animals'].fillna(df['animals'].isna().cumsum().sub(1).map(s))
df

    animals
0  unknown1
1       cat
2       dog
3  unknown2
4     hippo
5  elephant

How this works
The issue with your current approach is that fillna either works with a single fill value or else a Series of same length as the column/DataFrame being filled. It will then take the Nth fill value from the series to fill the NaN at the Nth position in the column. From the docs:

value : scalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame).

So we get around this by creating a temp series with similar value that has the fill value we want at the right position:

df['animals'].isna().cumsum()

0    1
1    1
2    1
3    2
4    2
5    2
Name: animals, dtype: int64

s = pd.Series(['unknown1', 'unknown2'])
df['animals'].isna().cumsum().sub(1).map(s)

0    unknown1
1    unknown1
2    unknown1
3    unknown2
4    unknown2
5    unknown2
Name: animals, dtype: object
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Great solution, thank you. The details as to why this works is really helpfull – user2926337 May 06 '23 at 17:25
  • @user2926337 if the answer helped you can mark it accepted by clicking on the grey check to the left of the answer to toggle it green. you can only accept one answer. – cs95 May 09 '23 at 06:00
0

How can I get this to work so that I can replace the missing values in a given column with a pandas series of missing values with a length equal to the number of missing values in that column?

You can simply use loc and don't use pd.Series to avoid index alignment issue:

df.loc[df['animals'].isna(), 'animals'] = ['unknown1', 'unknown2']

Output:

>>> df
    animals
0  unknown1
1       cat
2       dog
3  unknown2
4     hippo
5  elephant

Your code can only work if you use the same index for your Series:

>>> df['animals'].fillna(pd.Series(['unknown1', 'unknown2'], 
                                   index=df[df['animals'].isna()].index))

0    unknown1
1         cat
2         dog
3    unknown2
4       hippo
5    elephant
Name: animals, dtype: object
Corralien
  • 109,409
  • 8
  • 28
  • 52