I have a csv file with malformed data in the last column. Meaning that the last column can contain "
or also ,
characters.
I would now like to read this csv into a dataframe using polars rust.
I managed to do this using this:
let mut schema = Schema::new();
schema.with_column(String::from("timestamp"), Datetime(Microseconds, None));
schema.with_column(String::from("level"), Utf8);
schema.with_column(String::from("category"), Utf8);
schema.with_column(String::from("file_path"), Utf8);
schema.with_column(String::from("line"), Int32);
schema.with_column(String::from("message"), Utf8);
let df = CsvReader::from_path(cli.log_file)?
.with_schema(&schema)
.has_header(false)
.with_parse_dates(true)
.with_quote_char(None)
.finish()?;
The problem I'm facing is that polars seems to just cut off the message at the end, if there is a comma in it and then throw away the rest.
Is there a way to tell polars to just "greedily" grab the rest of the line into the last column?
I tried changing the type of the message
schema column, to List(Box::new(Utf8))
, but this fails during runtime, with the error:
Error: ComputeError(Owned("Unsupported data type List(Utf8) when reading a csv"))