library(tidyverse)
I have this stupid csv file where someone forgot to quote the string in the last column which may contain newlines. (The 2nd row in this example doesn’t)
csv_file <- str_c(
"a,b,c\n",
"1,1,first\nrow\n",
"1,1,second row\n",
"1,1,third\nrow\n",
collapse = ""
)
Trying to read the file with read_csv()
gives rise to problems.
read_csv(csv_file)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> Rows: 5 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): a, c
#> dbl (1): b
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 5 × 3
#> a b c
#> <chr> <dbl> <chr>
#> 1 1 1 first
#> 2 row NA <NA>
#> 3 1 1 second row
#> 4 1 1 third
#> 5 row NA <NA>
How can one parse the file to get the expected results?
tibble(
a = c(1, 1, 1),
b = c(1, 1, 1),
c = c("first\nrow", "second row", "third\nrow")
)
#> # A tibble: 3 × 3
#> a b c
#> <dbl> <dbl> <chr>
#> 1 1 1 "first\nrow"
#> 2 1 1 "second row"
#> 3 1 1 "third\nrow"
Created on 2023-01-06 with reprex v2.0.2