0

I have a csv file with malformed data in the last column. Meaning that the last column can contain " or also , characters. I would now like to read this csv into a dataframe using polars rust.

I managed to do this using this:

let mut schema = Schema::new();
schema.with_column(String::from("timestamp"), Datetime(Microseconds, None));
schema.with_column(String::from("level"), Utf8);
schema.with_column(String::from("category"), Utf8);
schema.with_column(String::from("file_path"), Utf8);
schema.with_column(String::from("line"), Int32);
schema.with_column(String::from("message"), Utf8);

let df = CsvReader::from_path(cli.log_file)?
    .with_schema(&schema)
    .has_header(false)
    .with_parse_dates(true)
    .with_quote_char(None)
    .finish()?;

The problem I'm facing is that polars seems to just cut off the message at the end, if there is a comma in it and then throw away the rest.

Is there a way to tell polars to just "greedily" grab the rest of the line into the last column?

I tried changing the type of the message schema column, to List(Box::new(Utf8)), but this fails during runtime, with the error:

Error: ComputeError(Owned("Unsupported data type List(Utf8) when reading a csv"))
  • Please provide a minimally reproducible example. The code as provided will not compile. Having an example of the data you're trying to load is also helpful as well. – emagers Apr 23 '23 at 19:30

0 Answers0