0

I have some data that looks something like this:

myDat <- data.frame(V1=rnorm(1000), 
                    V2=sample(LETTERS, 1000, replace=TRUE), 
                    V3=sample(1:1000, 1000))
write.table(myDat, file="myData.txt", sep="|", quote=FALSE, col.names=FALSE, row.names=FALSE)
write("00001000 RECORDS", file="myData.txt", append=TRUE) # This is the annoying footer

Note the footer at the bottom that is always there.. I would like to read it in using sqldf. Something like this:

f <- file("myData.txt")
hkFile <- sqldf("SELECT * FROM f", dbname=tempfile(), 
                file.format=list(header=FALSE, sep="|", eol="\n"))

I keep getting this error because of the footer appended to the data:

Error in .local(conn, name, value, ...) : 
  RS_sqlite_import: myData.txt line 1001 expected 3 columns of data but found 1

Is there a simple way to get around the footer? Copying the data to a temporary file is not an option since some of my datasets are ~6GB

statsNoob
  • 1,325
  • 5
  • 18
  • 36

1 Answers1

1

Just because the file is large does not necessarily mean that you could not copy it.

This will do it but it will transparently copy the file.

my <- read.csv.sql("myData.txt", sql = "select * from file", sep = "|", 
  header = FALSE, filter = "head -n -1")

On Windows it assumes that Rtools is installed and in that case usually it will be able to find head but if not give the path, e.g. filter = "C:/Rtools/bin/head -n - 1" . If on Linux then head should already be available.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • This worked, thanks! Could you elaborate on what you mean by "transparently copy" and what is going on there? – statsNoob Nov 24 '15 at 15:57
  • It will run the input file through `head -n -1` creating a temporary file and then read that temporary in so a copy is made. Try adding the `verbose=TRUE` to the `read.csv.sql` arguments to see details. – G. Grothendieck Nov 24 '15 at 16:09