1

I’ve got some strange values in my date column of my dataset. I’m trying to change these unexpected values into NaN.

I don’t know what these unexpected values will be, hence why I made df 2 - where I’m searching for months (e.g. Dec, March) and then removing these and then seeing what I’ve got left. So now I know that the weird data is in row 1 and 3. But how do I now change the Birthday column value for row 1 and row 3 to say NaN?

My real dataset is much bigger so it’s a bit awkward to just type in the row numbers manually.

#Creating the example df
import pandas as pd
data = {'Age': [20, 21, 19, 18],
        'Name': ['Tom', 'nick', 'krish', 'jack'],
       'Birthday': ["Dec-82", "heidgo", "Mar-84", "ishosdg"]}
df = pd.DataFrame(data)


#Finding out which rows have the weird values 
df2 = df[~df["Birthday"].str.contains("Dec|Mar")]
wick
  • 61
  • 4

1 Answers1

0

Locate records that fit the condition to fill their Birthday column with NaN:

df.loc[~df["Birthday"].str.contains("Dec|Mar"), 'Birthday'] = np.nan

   Age   Name Birthday
0   20    Tom   Dec-82
1   21   nick      NaN
2   19  krish   Mar-84
3   18   jack      NaN
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • Thanks for the reply. But its the oppose that I'm trying to do. So I would want the birthday column to say: Dec-82, NaN, Mar-84, NaN – wick Feb 19 '23 at 18:18
  • @wick, but you wrote it *change the Birthday column value for row 1 and row 3 to say NaN?* – RomanPerekhrest Feb 19 '23 at 18:20
  • Yes, that is row 1 and 3 isnt it? Because python uses zero indexing. Unless I'm mistaken. – wick Feb 19 '23 at 19:24
  • @wick, the description sounded confusing in terms of indexed and ordinal numbers that you mentioned. See my update – RomanPerekhrest Feb 19 '23 at 19:34