10

I have an object x that contains a list of lists of matrices and model objects from lm and gbm, etc. object.size(x) shows only about 50MB, but the file resulting from saveRDS is more than 5 times larger at more than 250MB. In general, what are some of the common causes for the RDS file to be much larger than the corresponding object size? And what can I do to minimize the discrepancy between the object size and the file size?

EDIT:

I have trimmed down my original problem enough to give a reproducible example (I know the code is lapplying over one element, but this is a reduced example). There seems to be at least 2 problems:

1) The resulting RDS files are about 2~3 times larger than their corresponding object size.

2) The objects from lapply and mclapply have the nearly the same object.size, yet the resulting file is 1.5 times larger for the object returned from mclapply.

Since fit1 and fit2 have almost the same size, inspecting the size of their components within R doesn't seem to be too helpful. Does anyone have suggestion on how to debug this problem?

library(doParallel)
library(data.table)
library(caret)

fitModels <- function(dmy, dat, file.name) {

    methods <- list(
        list(method = 'knn', tuneLength = 1),
        list(method = 'svmRadial', tuneLength = 1)
    )

    opts <- list(
        form = as.formula('X1 ~ .'),
        data = as.data.frame(dat),
        trControl = trainControl(method = 'none', returnData = F)
    )

    fit <- mclapply(methods, function(x) do.call(train, c(opts, x)), mc.cores = 2)
    saveRDS(fit, paste(file.name, 'rds', sep = '.'))
    return(fit)
}

dat <- data.frame(matrix(rnorm(5e4), nrow = 1e3))

fit1 <-   lapply(1, fitModels, dat, file.name = 'test1')
fit2 <- mclapply(1, fitModels, dat, file.name = 'test2', mc.cores = 1)

print(object.size(fit1))
print(object.size(fit2))

print(file.info('test1.rds')$size)
print(file.info('test2.rds')$size)

The output is:

2148744 bytes
2149208 bytes
[1] 4659831
[1] 6968437
  • 2
    Duplicate of many existing questions: [Object size discrepancy](http://stackoverflow.com/a/7734342/202229), [What determines the size of a saved object in R?](http://stackoverflow.com/questions/24539106/what-determines-the-size-of-a-saved-object-in-r) ... please search and browse for your answer. – smci May 24 '15 at 06:55
  • The answers in both questions deal with different levels of compression depending on the object content, but they still end up with smaller file size. In my case I observed that the file size is much bigger than the object size. –  May 24 '15 at 21:07
  • 1
    I met a similar problem, tested saving each component individually, and found a function with object size of 65k was saved as 19M rds. That function used an external package, maybe everything referenced were serialized. – dracodoc Apr 05 '18 at 15:14
  • 2
    A more relevant existing question: https://stackoverflow.com/questions/42230920/saverds-inflating-size-of-object. FWIW, the answer given at the link sometimes (but not always) solves this issue for me when it's come up. – pbaylis Jun 28 '18 at 19:30
  • 1
    Investigate this [`{butcher}`](https://butcher.tidymodels.org/) as it fundamentally deals with the same issue. – Mossa Oct 26 '21 at 18:36

0 Answers0