I have an object x
that contains a list of lists of matrices and model objects from lm
and gbm
, etc. object.size(x)
shows only about 50MB, but the file resulting from saveRDS
is more than 5 times larger at more than 250MB. In general, what are some of the common causes for the RDS file to be much larger than the corresponding object size? And what can I do to minimize the discrepancy between the object size and the file size?
EDIT:
I have trimmed down my original problem enough to give a reproducible example (I know the code is lapplying
over one element, but this is a reduced example). There seems to be at least 2 problems:
1) The resulting RDS files are about 2~3 times larger than their corresponding object size.
2) The objects from lapply
and mclapply
have the nearly the same object.size
, yet the resulting file is 1.5 times larger for the object returned from mclapply
.
Since fit1
and fit2
have almost the same size, inspecting the size of their components within R doesn't seem to be too helpful. Does anyone have suggestion on how to debug this problem?
library(doParallel)
library(data.table)
library(caret)
fitModels <- function(dmy, dat, file.name) {
methods <- list(
list(method = 'knn', tuneLength = 1),
list(method = 'svmRadial', tuneLength = 1)
)
opts <- list(
form = as.formula('X1 ~ .'),
data = as.data.frame(dat),
trControl = trainControl(method = 'none', returnData = F)
)
fit <- mclapply(methods, function(x) do.call(train, c(opts, x)), mc.cores = 2)
saveRDS(fit, paste(file.name, 'rds', sep = '.'))
return(fit)
}
dat <- data.frame(matrix(rnorm(5e4), nrow = 1e3))
fit1 <- lapply(1, fitModels, dat, file.name = 'test1')
fit2 <- mclapply(1, fitModels, dat, file.name = 'test2', mc.cores = 1)
print(object.size(fit1))
print(object.size(fit2))
print(file.info('test1.rds')$size)
print(file.info('test2.rds')$size)
The output is:
2148744 bytes
2149208 bytes
[1] 4659831
[1] 6968437