3

I created bagged tree model method = "treebag"using the caret package in R and the resulting model size is 12 Mb when viewing in R-Studio. But when I save to disk for later use with save() the size on disk increases to 151 Mb! Using different compression schemes brings the size down a bit but all are still way larger than in memory. Anyone successfully dealt with this problem?

jtdoud
  • 126
  • 7

2 Answers2

3

The likely reason is that the enclosing environment associated with objects is not considered in the results of object.size(), but is written to disk when saved. Use the pryr::object_size() function to see the object size with environment included. More explanation can be found at: http://adv-r.had.co.nz/memory.html#object-size

> object.size(m1)
16200200 bytes
> pryr::object_size(m1)
215 MB
> save(m1, file="m1.rda")
> file.info("m1.rda")$size
[1] 219475772

There also has been some discussion of this issue in another question: object.size() reports smaller size than .Rdata file

Community
  • 1
  • 1
LmW.
  • 1,364
  • 9
  • 16
2

Are you talking about the train object?

The bagging function isn't very optimized and a lot of redundant objects are saved in the forest (e.g. each terms object for every rpart model).

See the trim option of trainControl. If you only want to make predictions on that object, this will get rid of a lot of extra stuff carried over by the model object. In some cases, the call object can contain a complete copy of the data.

trim isn't implemented for every model but is it for this one.

topepo
  • 13,534
  • 3
  • 39
  • 52
  • Yes, the train object. And yes I used `trim`. But I don't think this is getting to the point. _Why the huge increase when saved to disk?_ I don't think the `trainControl` parameters should affect that. – jtdoud Aug 22 '15 at 00:24
  • 1
    PS... I also used `returnData = F` – jtdoud Aug 22 '15 at 00:31