1

I have a large ffdf data frame saved to disk that I need to load into a fresh R session. When I run load.ffdf in the directory where the file is located, I get the following error message:

load.ffdf("./ffdb")
#    Error in `filename<-.ff`(`*tmp*`, value = "./custTrans$custKey.ff") : 
#    ff file rename from './custTrans$custKey.ff' to
#    'mylocation'/ffdb/custTrans$custKey.ff' failed

I really want to read these files. Is there a way to encourage them to be read? Is there some way to read the individual ff column-files directly? What format are they in? Perhaps I can place them manually in the temporary location that the underlying ff package uses?

I've had a look through the save.ffdf and load.ffdf functions, but that hasn't given me any easy fixes.

Backgroud: I originally saved the data frame custTrans to the default ./ffdb directory. I actually wanted them in directory ./custTrans, so I used move.ffdf to move the files. The column files were moved, but not the .RData and .Rprofile files. I have tried to load the data from ./ffdb directory and also copied the .RData and .Rprofile files to the ./custTrans directory and run load.ffdf there. I have also tried to move the data files back to the ./ffdb directory. The error message is the same.

dynamo
  • 2,988
  • 5
  • 27
  • 35
  • I've added a working fix to my problem, but I would love to understand more how this *should* be done and what went wrong in my case! (Or: What I was doing wrong.) – dynamo Oct 02 '13 at 08:06
  • I think the best for your case would be to move your data back in the original directory. Do load.ffdf of the data and save.ffdf to the directory you want it. –  Oct 02 '13 at 10:14

1 Answers1

0

I've found a part-solution to the problem. I can now read the raw ff files using readBin. Since my ffdf is loaded into the search path, I can use it to see what the specific data types in my columns were, as well as the column lengths. Printing the object gives me the information.

custTrans

Then readBin, with what and n set according to the information printed above, will read the files. Then can them be combined back into an ffdf using standard methodology.

custKey <- readBin("./custTrans/MINS$custKey.ff", what = "int", n = 268820)
Transactiondate.max <- readBin("./custTrans/MINS$Transactiondate.max.ff",
                               what = "double", n = 268820)
Transactiondate.min <- readBin("./custTrans/MINS$Transactiondate.min.ff",
                               what = "double", n = 268820)
custTrans <- as.ffdf(as.ff(custKey),
                     as.ff(Transactiondate.max),
                     as.ff(Transactiondate.min))

This obviously assumes all this fits into memory, which it does. (It was not the size of the files that were a problem, but the generating them took a very long time.)

dynamo
  • 2,988
  • 5
  • 27
  • 35