I created bagged tree model method = "treebag"
using the caret package in R and the resulting model size is 12 Mb when viewing in R-Studio. But when I save to disk for later use with save()
the size on disk increases to 151 Mb! Using different compression schemes brings the size down a bit but all are still way larger than in memory. Anyone successfully dealt with this problem?

- 126
- 7
2 Answers
The likely reason is that the enclosing environment associated with objects is not considered in the results of object.size(), but is written to disk when saved. Use the pryr::object_size() function to see the object size with environment included. More explanation can be found at: http://adv-r.had.co.nz/memory.html#object-size
> object.size(m1)
16200200 bytes
> pryr::object_size(m1)
215 MB
> save(m1, file="m1.rda")
> file.info("m1.rda")$size
[1] 219475772
There also has been some discussion of this issue in another question: object.size() reports smaller size than .Rdata file
Are you talking about the train
object?
The bagging
function isn't very optimized and a lot of redundant objects are saved in the forest (e.g. each terms
object for every rpart
model).
See the trim
option of trainControl
. If you only want to make predictions on that object, this will get rid of a lot of extra stuff carried over by the model object. In some cases, the call
object can contain a complete copy of the data.
trim
isn't implemented for every model but is it for this one.

- 13,534
- 3
- 39
- 52
-
Yes, the train object. And yes I used `trim`. But I don't think this is getting to the point. _Why the huge increase when saved to disk?_ I don't think the `trainControl` parameters should affect that. – jtdoud Aug 22 '15 at 00:24
-
1PS... I also used `returnData = F` – jtdoud Aug 22 '15 at 00:31