3

I am trying to load data into a polars DataFrame using the read_csv command but I keep getting this error

RuntimeError: Any(ComputeError("Could not parse 0.5 as dtype Int64 at column 13.\n                                            The total offset in the file is 11684833 bytes.\n\n                                            Consider running the parser `with_ignore_parser_errors=true`\n                                            or consider adding 0.5 to the `null_values` list."))

While I used the converters argument as follows:

converters = {
    'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
    'Number': lambda x: float(x)
    }

The error still persists. I also tried to use the argument displayed in the error:

with_ignore_parser_errors=TRUE

The error is still there. What can I do? My issue is not with parsing dates, but rather with parsing numbers. This is what I have for now:

    converters = {
    'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
    'Number': lambda x: float(x)
    }
    df_file = pl.read_csv(file_to_read, has_headers=True, converters=converters,with_ignore_parser_errors=TRUE)
Rayen
  • 31
  • 1
  • 2
  • I also tried to use a function as the converter def col_fixer(x): try: return float(x) except ValueError: return np.str df_file = pl.read_csv(file_to_read, has_headers=True, converters=dict(B=col_fixer)) – Rayen Jan 24 '22 at 19:05

1 Answers1

3

Polars doesn't have a converters argument. So that won't work.

It seems that a floating point column is trying to be parsed as integers. You can manually set the dtype to pl.Int64 by passing the column name as kwargs: pl.read_csv(.., dtype = {"foo": pl.Int64}.

Or you can increase the infer_schema_length so that polars automatically detects floats (the first 100 rows probably only contain integers).

The default is 100, try increasing it until schema inference correctly detects the floating point column.

ritchie46
  • 10,405
  • 1
  • 24
  • 43