I have a large csv file that I'd like to read with arrow::read_csv_arrow()
. However, the file contains quoted strings. readr::read_delim()
is able to read the file (given correct settings), while arrow::read_csv_arrow()
is not:
library(arrow)
library(readr)
# create offending file
x <- tempfile()
write_lines(
'
id,text
1,Some interesting text
2,"Some text on: \"how to break arrow\" by X, and Y"
', x)
read_delim(x, delim = ",", escape_double = FALSE, escape_backslash = TRUE)
#> Rows: 2 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): text
#> dbl (1): id
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2 × 2
#> id text
#> <dbl> <chr>
#> 1 1 "Some interesting text"
#> 2 2 "Some text on: \"how to break arrow\" by X, and Y"
read_csv_arrow(x, escape_double = FALSE, escape_backslash = TRUE)
#> Error:
#> ! Invalid: CSV parse error: Row #3: Expected 2 columns, got 3: 2,"Some text on: "how to break arrow" by X, and Y"
Created on 2022-10-13 with reprex v2.0.2
I have tried various settings on the parser, to no avail, such as:
read_csv_arrow(x, parse_options = CsvParseOptions$create(double_quote = FALSE, escaping = TRUE))