1

I'm working writing some R extensions on C (C functions to be called from R).

My code needs to compute a statistic using 2 different datasets at the same time, and I need to perform this with all possible pair combinations. Then, I need all these statistics (very large arrays) to continue the calculation on the C side. Those files are very large, typically ~40GB, and that's my problem.

To do this on C called by R, first I need to load all the datasets in R to pass them then to the C function call. But, ideally, it is possible to maintain only 2 of those files on memory at the same time, following the sequence if I were able to access the datasets from C or Fortran directly:

open  file1 - open file2 - compute cov(1,2)
close file2
hold  file1 - open file3 - compute cov(1,3)
... // same approach

This is fine on R because I can load/unload files, but when calling C or Fortran I haven't any mechanism to load/unload files. So, my question is, can I read .Rdata files from Fortran or C directly, being able to open/close them? Any other approaches to the problem?

As far as I've read, the answer is no. So, I'm considering to move from Rdata to HDF5.

srodrb
  • 1,304
  • 13
  • 23
  • Of course! Any file format **can** be read but the question is: is it worth the effort? Why not just save your big files to separate `.Rdata` files and load the one you want, or as you suggested, just save them in a simpler format to begin with? – Carl Witthoft Nov 17 '14 at 22:29
  • Even storing my data on different files I must load all of them before calling the C code from R. Your approach is correct if it was possible to maintain a kind of stream between the C and the R codes... I believe that using a simpler format is the best option. – srodrb Nov 17 '14 at 22:34

1 Answers1

4

It is not too hard to call R functions from C, using the .Call interface. So write an R function that inputs the data, and invoke that from C. When you're done with one file, UNPROTECT() the data you've read in. This is illustrated in the following

## function that reads my data in from a single file
fun <- function(fl)
    readLines(fl)

library(inline)  ## party trick -- compile C code from within R
doit <- cfunction(signature(fun="CLOSXP", filename="STRSXP", env="ENVSXP"), '
    SEXP lng = PROTECT(lang2(fun, filename)); // create R language expression
    SEXP ans = PROTECT(eval(lng, env));       // evaluate the expression
    // do things with the ans, e.g., ...
    int len = length(ans);
    UNPROTECT(2);                     // release for garbage collection
    return ScalarInteger(len);        // return something
')

doit(fun, "call.R", environment())

A simpler approach is to invert the problem -- read two data files in, then call C with the data.

Martin Morgan
  • 45,935
  • 7
  • 84
  • 112
  • Impressive answer, Sir. Still there are some aspects about the memory usage I have to study carefully... – srodrb Nov 18 '14 at 08:12