10

I am trying to replace certain strings in a column in pandas, but am getting NaN for some rows. The column is an object datatype.

I want all rows with 'n' in the string replaced with 'N' and and all rows with 's' in the string replaced with 'S'. In other words, I am trying to capitalize the string when it appears.

However, I am am getting NaN values for rows without 'n' or 's' in the string. How can I replace 'n' and 's' without getting NaN for the other values?

Here is the head of my dataframe:

data_frame['column_name'].head(10)
0      1n
1      1n
2      1n
3      1n
4      2n
5      2s
6       3
7       3
8      4s
9      4s

After replacing, the string '3' is now NaN:

data_frame['column_name'] = data_frame['column_name'].str.replace('n', 'N')
data_frame['column_name'] = data_frame['column_name'].str.replace('s', 'S')

data_frame['column_name'].head(10)
Out[87]: 
0     1N
1     1N
2     1N
3     1N
4     2N
5     2S
6    NaN
7    NaN
8     4S
9     4S
Name: NCU, dtype: object

Please let me know if I can add more information.

Scott Davis
  • 983
  • 6
  • 22
  • 43

1 Answers1

7

Simpliest solution is cast column to string - then is possible use str.upper or str.replace:

data_frame['column_name'] = data_frame['column_name'].astype(str)
data_frame['column_name'] = data_frame['column_name'].str.replace('n', 'N')
data_frame['column_name'] = data_frame['column_name'].str.replace('s', 'S')
print (data_frame)
  column_name
0          1N
1          1N
2          1N
3          1N
4          2N
5          2S
6           3
7           3
8          4S
9          4S

But if need numeric with strings together:

I think you need Series.replace, because you have mixed values - numeric with strings and str.replace return NaN where numeric values (bur works another solution with mask):

data_frame['column_name'] = data_frame['column_name'].replace(['n', 's'],
                                                              ['S','N'],
                                                              regex=True)
print (data_frame)
  column_name
0          1S
1          1S
2          1S
3          1S
4          2S
5          2N
6           3
7           3
8          4N
9          4N

Another solution is filter only string and use Series.mask with str.upper:

mask = data_frame['column_name'].apply(type) == str
data_frame['column_name'] = data_frame['column_name'].mask(mask,
                            data_frame['column_name'].str.upper())
print (data_frame)
  column_name
0          1N
1          1N
2          1N
3          1N
4          2N
5          2S
6           3
7           3
8          4S
9          4S

Another solution is replace NaN by combine_first or fillna:

upper = data_frame['column_name'].str.upper()
data_frame['column_name'] = upper.combine_first(data_frame['column_name'])
#alternative solution
#data_frame['column_name'] = upper.fillna(data_frame['column_name'])
  column_name
0          1N
1          1N
2          1N
3          1N
4          2N
5          2S
6           3
7           3
8          4S
9          4S
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thank you jezrael, I had to convert the datatype to str. Its been awhile with pandas, I thought the 'object' datatype was the same type of string. – Scott Davis May 10 '17 at 18:01
  • Yes, it is obviously string. But not always, you can check [this](http://stackoverflow.com/questions/42672552/pandas-cast-column-to-string-does-not-work/42672574#42672574) – jezrael May 10 '17 at 19:05