EDIT: long story short, duckdb_read_csv
needs the quote=""
just like read.csv
does in the previous answer (preserved, below). You may still need to consider some of the reparative steps for removing other quotes that may now be retained by this.
Reprex:
duck <- DBI::dbConnect(duckdb::duckdb())
writeLines(c('a,b,d', '1,"Boticas MiCasa" RSL,3'), "fake.csv")
duckdb::duckdb_read_csv(duck, "faketable", "fake.csv")
# Error: rapi_execute: Failed to run query
# Error: Invalid Input Error: Error in file "/home/r2/fake.csv" on line 2: quote should be followed by end of value, end of row or another quote. ( file=/home/r2/fake.csv
# delimiter=','
# quote='"'
# escape='"' (default)
# header=1
# sample_size=20480
# ignore_errors=0
# all_varchar=0).
duckdb::duckdb_read_csv(duck, "faketable", "fake.csv", quote="")
DBI::dbGetQuery(duck, "select * from faketable")
# a b d
# 1 1 "Boticas MiCasa" RSL 3
Just a thought to help troubleshoot, not sure it'll work.
When the error is about quotes, you can turn off quote process and try again. It will result in having quotes all throughout your data, but (1) these are easy to remove, and (2) at least you can see where the problem is.
Unfortunately ... if the data has embedded commas within strings (which would make the use of quotes required), then this won't work.
Demo:
# normal
read.csv(text='a,b,d\n"1",2,3\n"4",5,6')
# a b d
# 1 1 2 3
# 2 4 5 6
read.csv(text='a,b,d\n"1",2,3\n"4,5,6')
# Error in read.table(file = file, header = header, sep = sep, quote = quote, :
# incomplete final line found by readTableHeader on 'text'
We can disable processing of quotes by adding quote=""
, and see the problematic row:
read.csv(text='a,b,d\n"1",2,3\n"4,5,6', quote="")
# a b d
# 1 "1" 2 3
# 2 "4 5 6
We can clean up both the good ("1"
) and bad ("4
) strings with this pipeline:
read.csv(text='a,b,d\n"1",2,3\n"4,5,6', quote="") |>
lapply(function(z) if (is.character(z)) type.convert(gsub(r"(^['"]|['"]$)", "", z), as.is=TRUE) else z) |>
as.data.frame()
# a b d
# 1 1 2 3
# 2 4 5 6
If you have embedded quotes (requiring CSV's double-quoting escape, as in "hello ""world"" again"
, then you'd need another step in the cleanup to work on those middle quotes.