0

I'm trying to run the following command in R in order to read a local tab-delimited file as a SQLite database:

library(RSQLite)
banco <- dbConnect(drv    = "SQLite",
                   dbname = "data.sqlite")
dbWriteTable(conn       = banco,
             name       = "Tarefas",
             value      = "data.tsv",
             sep        = "\t",
             dec        = ",",
             na.strings = c("", NA),
             row.names  = FALSE,
             header     = TRUE)

However, the statements above yield the following error:

Error in read.table(fn, sep = sep, header = header, skip = skip, nrows = nrows, : formal argument "na.strings" matched by multiple actual arguments

Which makes me think I'm not being able to pass na.strings explicitly as a read.delim argument. Running dbWriteTable without this argument gives me "RS-DBI driver: (RS_sqlite_import: ./data.tsv line 17696 expected 20 columns of data but found 18)". This is understandable, since I've checked line 17696 and it is almost completely blank.

Another test run using sqldf also gives me an error:

> read.csv2.sql(file      = "data.tsv",
+               sql       = "CREATE TABLE Tarefas AS SELECT * FROM FILE LIMIT 5",
+               dbname    = "data.sqlite",
+               header    = TRUE,
+               row.names = FALSE)
Error in sqliteExecStatement(con, statement, bind.data) : 
  RS-DBI driver: (error in statement: no such table: FILE)

Which I believe is an unrelated error, but still very confusing for someone who's pretty much an absolute SQL noob such as myself. Runnin read.csv.sql instead gives me this error:

Error in read.table(fn, sep = sep, header = header, skip = skip, nrows = nrows, : more columns than column names

So is there a way to pass na.strings = c("", NA) at dbWriteTable? Is there a better way to read 10 GB tab-delimited files into R aside from sqldf and RSQLite? I've already tried data.table and ff.

Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
  • This is just a shot in the dark, but what happens if you try `na.strings = c("","NA")`? – joran Aug 07 '14 at 19:37
  • @joran, unfortunately, same error. – Waldir Leoncio Aug 07 '14 at 19:39
  • [Here](https://github.com/rstats-db/RSQLite/blob/master/R/ConnectionWrite.R#L220) is the source of the problem, I think. Investigating... – joran Aug 07 '14 at 19:40
  • 1
    Yeah, I don't see a way around this, other than perhaps pre-cleaning the text file so that the `na.strings` argument isn't needed. Either way, might be worth creating an issue on github about it... – joran Aug 07 '14 at 19:46
  • Anyway, thanks for the help @joran! BTW, do you think the "expected 20 columns of data but found 18" error has something to do with this? – Waldir Leoncio Aug 07 '14 at 19:50
  • 1
    I suspect that's a separate problem, but probably one that will also need to be addressed. I would try just reading in a few dozen lines near the problem line using `skip` and `nrow` and seeing if you can narrow it down. Usually that sort of thing comes down to a weird character or missing delimiter in your file or something. – joran Aug 07 '14 at 19:52

0 Answers0