I am using ff and R because I have a huge dataset (around 16 GB) to work with. As a test case, I got the file to read around 1M records and wrote it out as a ff database.
system.time(te3 <- read.csv.ffdf(file="testdata.csv", sep = ",", header=TRUE, first.rows=10000, next.rows=50000, colClasses=c("numeric","numeric","numeric","numeric")))
I have uploaded the resulting file (te3) here: http://bit.ly/1c8pXqt
I tried to do a simple calculation to create a new variable
ffdfwith(te3, {odfips <- ofips*100000 + dfips})
I get the following error (there are no missing records) which has flummoxed me:
Error in if (by < 1) stop("'by' must be > 0") : missing value where TRUE/FALSE needed
In addition: Warning message: In chunk.default(from = 1L, to = 1000000L, by = 2293760000, maxindex = 1000000L) : NAs introduced by coercion
Any insights will be appreciated. Also, related to FF, is it possible to use standard R packages such as MCMC (I need to use the inverse gamma function) with FF databases?
TIA,
Krishnan