Dealing with greater than and less than values in numeric data when reading csv in pandas

Question

My csv file contains numeric data where some values have greater than or less than symbols e.g. ">244". I want my data type to be a float. When reading the file into pandas:

df = pd.read_csv('file.csv')

I get a warning:

Columns (2) have mixed types. Specify dtype option on import or set low_memory=False.

I have checked this question: Pandas read_csv: low_memory and dtype options and tried specifying the date type of the relevant column with:

df = pd.read_csv('file.csv',dtype={'column':'float'})

However, this gives an error:

ValueError: could not convert string to float: '>244'

I have also tried

df = pd.read_csv('file.csv',dtype={'column':'float'}, error_bad_lines=False)

However this does not solve my problem, and I get the same error above.

My problem appears to be that my data has a mixture of string and floats. Can I ignore any rows containing strings in particular columns when reading in the data?

`>244` is not float, use `df = pd.read_csv('file.csv',dtype={'column':'str'})` — Ynjxsjmh, Aug 01 '22 at 12:33
OK. To clarify, I don't want my datatype to be a string, can I ignore the strings when reading in the data? — nf2, Aug 01 '22 at 12:40

score 1 · Answer 1 · answered Aug 01 '22 at 12:59

I found a workaround which was read in my data

df = pd.read_csv('file.csv')

Then remove any values with '<' or '>'

df = df.loc[df['column'].str[:1] != '<']
df = df.loc[df['column'].str[:1] != '>']

Then convert to numeric with pd.to_numeric

df['column'] = pd.to_numeric(df['column'])

score 1 · Accepted Answer · answered Aug 01 '22 at 13:06

1

You can use:

df = pd.read_csv('file.csv', dtype={'column':'str'})

Then:

df['column'] = pd.to_numeric(df['column'], errors='coerce')

answered Aug 01 '22 at 13:06

Corralien

109,409
8
28
52

Dealing with greater than and less than values in numeric data when reading csv in pandas

2 Answers2