1

I like the GBM package in R.

I can't get R's memory management to work with the combination of my machine/data set/task needed for reasons that have been covered elsewhere and should be considered off topic for the purposes of this question.

I would like to "rip" out the GBM algorithm away from R and rebuild it as standalone code.

Unfortunately there is no Makefile in the package tarball (or indeed any R package tarball I've seen). Is there a place I can look for straightforward Makefiles of R packages? Or do I really have to go way back to ground zero and write my own Makefile for the long painful journey ahead?

Stephan
  • 99
  • 1
  • 9
  • A lot of really appreciate R. It is a fine tool, which might not work optimally in all cases, in which case one can turn to e.g. C++. Throwing around these kind of harsh accusations with no explanation is making the chance of us helping you very small, i.e. -1... – Paul Hiemstra Sep 07 '12 at 22:18
  • I'm sorry. It has been a long week, and you're quite right. I used the wrong tone, however, I can't think of other words for memory management which takes over 4Gb of RAM to store 350Mb of unsigned chars. I'd like to know how much more over 4GB it takes, but it allocates RAM by doubling the previous allocation, which makes it explode geometrically. Like I say - a long week. Apologies for getting emotional. – Stephan Sep 07 '12 at 22:22
  • 1
    Look at packages such `ff` for more compact storage. – Dirk Eddelbuettel Sep 07 '12 at 22:22
  • They don't cover more than 250 columns on my machine and are therefore not the solution. C++ seems to be the only way to use 350Mb to store 350Mb's worth of data but if looks like to get there I need to write my own Makefile.... – Stephan Sep 07 '12 at 22:25
  • @Stephan, probably you are doing something suboptimal. I've had nor problem reading sets of e.g. doubles and getting the correct amount of memory usage. You could post a reproducible example which shows the excessive memory usage... – Paul Hiemstra Sep 07 '12 at 22:26
  • 2
    There's no way R takes 4Gb of RAM to store 350Mb of unsigned chars. I showed you in [a previous question](http://stackoverflow.com/q/12271274/271616) that there are ways to avoid creating copies, which is likely the problem you're having. In my answer, I read in a 700Mb CSV using **at most** 1.5Gb of RAM. You need to provide some evidence before I'm going to believe that it takes >2x the RAM to store half the data. – Joshua Ulrich Sep 07 '12 at 22:26
  • 1
    @JoshuaUlrich and probably a call to the garbage collector will free the excess 700 mb. I've worked with >2 gb binary files, and calculated covariance matrices iteratively. So a 350 gb file should be doable. – Paul Hiemstra Sep 07 '12 at 22:29
  • @stephan, maybe you could edit out the anger, leaving a valid question. – Paul Hiemstra Sep 07 '12 at 22:30
  • @JoshuaUlrich 350Mb of sparse, mostly zero data stored as uint8s immediately explodes to 1.4Gb because R doesn't support anything less than a 32 bit number. That's before you allow anything for metadata or the behaviour of the garbage collector during a CSV read operation. – Stephan Sep 07 '12 at 22:35
  • @stephan retracted my -1, thanks for the edit! – Paul Hiemstra Sep 07 '12 at 22:40
  • 1
    @Stephan: Sorry, but I'd like reproducible evidence, not a story. Again, I showed in the previous question that you can read 700Mb of data (2e8 3-digit integers) as numeric while only using at most 1.5Gb of RAM and the object only took 760Mb to store once it was read in. – Joshua Ulrich Sep 07 '12 at 23:35

2 Answers2

7

As Henry Spencer quipped: "Those who do not understand Unix are doomed to reinvent it, poorly."

R packages do not have a Makefile because R creates one on the fly when building the package, using both the defaults of the current R installation and the settings in the package, typically via a file Makevars.

Run the usual command R CMD INSTALL foo_1.2.3.tar.gz and you will see the effect of the generated Makefile as the build proceeds. Worst case you can always start by copying and pasting.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
0

You could also take a look at CMake which can quite easily create makefiles for you. It took me minimal time to get it working for a project of mine.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149