0

I'm trying to clean test data from Kaggle's Titanic dataset, specifically the columns - sex, fare, pclass, and age. In order to do this, I'd like to find out if any of these columns have empty values. I load the data:

import pandas as pd
test_address = 'path_to_data\test.csv'
test = pd.read_csv(test_address)

When I try to look for empty values in the columns,

True in test['Sex'].isna()

outputs True.

However,

test['Sex'].isna().value_counts()

outputs

False    418
Name: Sex, dtype: int64

which should mean there aren't any empty values (I could confirm this by visually scanning the csv file). These commands on test['Pclass'] have similar outputs.

Why is the 'True in' command giving me the wrong answer?

1 Answers1

1

The operator in, when applied to a Series, checks if its left operand is in the index of the right operand. Since there is a row #1 (the numeric representation of True) in the series, the operator evaluates to True.

For the same reason False in df['Sex'].isna() is True, but False in df['Sex'][1:].isna() is False (there is no row #0 in the latter slice).

You should check if True in df['Sex'].isna().values.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • Great, thanks @DyZ. `True in test['Sex'].isna().values` works as expected (gives `False`). So do `False in test['Sex'][1:].isna()` (`False`) and `True in test['Sex'].isna()[2:]` (`False`)! – Aniket Deshpande Mar 22 '18 at 05:19