Throw in na_filter = False
to typecast your columns into strings. Then locate all rows with bad data then filter them out your dataframe.
>>> df = pd.read_csv('sample.csv', header = 0, na_filter = False)
>>> df
col1 col2 col3
0 0 1 1
1 0 0 0
2 1 1 1
3 col1 col2 col3
4 0 1 1
5 0 0 0
6 1 1 1
>>> type(df.iloc[0,0])
<class 'str'>
Now that you parsed your data in each column as strings, locate all col1, col2, and col3
values in your df, create a new column if you find them each column using np.where()
as such:
>>> df['Tag'] = np.where(((df['col1'] != '0') & (df['col1'] != '1')) & ((df['col2'] != '0') & (df['col2'] != '1')) & ((df['col3'] != '0') & (df['col3'] != '1')), ['Remove'], ['Don\'t remove'])
>>> df
col1 col2 col3 Tag
0 0 1 1 Don't remove
1 0 0 0 Don't remove
2 1 1 1 Don't remove
3 col1 col2 col3 Remove
4 0 1 1 Don't remove
5 0 0 0 Don't remove
6 1 1 1 Don't remove
Now, filter out the one tagged as Removed
in the Tag
column using isin()
.
>>> df2 = df[~df['Tag'].isin(['Remove'])]
>>> df2
col1 col2 col3 Tag
0 0 1 1 Don't remove
1 0 0 0 Don't remove
2 1 1 1 Don't remove
4 0 1 1 Don't remove
5 0 0 0 Don't remove
6 1 1 1 Don't remove
Drop the Tag
column:
>>> df2 = df2[['col1', 'col2', 'col3']]
>>> df2
col1 col2 col3
0 0 1 1
1 0 0 0
2 1 1 1
4 0 1 1
5 0 0 0
6 1 1 1
Finally typecast your dataframe into int, if you need it to be an integer:
>>> df2 = df2.astype(int)
>>> df2
col1 col2 col3
0 0 1 1
1 0 0 0
2 1 1 1
4 0 1 1
5 0 0 0
6 1 1 1
>>> type(df2['col1'][0])
<class 'numpy.int32'>
Note: If you want standard index use:
>>> df2.reset_index(inplace = True, drop = True)
>>> df2
col1 col2 col3
0 0 1 1
1 0 0 0
2 1 1 1
3 0 1 1
4 0 0 0
5 1 1 1