When performing survival analysis in R, fitting a model is reported to consume more memory, than the actual object being returned. Moreover, this seems to happen only a few times, not for every case.
require(survival)
require(pryr)
require(tidyverse)
dat <- tibble(
x = sample(letters[1:2], 1e5, replace = TRUE),
x2 = sample(LETTERS[1:2], 1e5, replace = TRUE),
e = sample(0:1, 1e5, replace = TRUE),
t = rweibull(1e5, shape = 1)
)
mem_change(fit <- survfit(formula = Surv(t, e) ~ x, data = dat))
mem_change(fit2 <- survfit(formula = Surv(t, e) ~ x, data = dat))
mem_change(fit3 <- survfit(formula = Surv(t, e) ~ 1, data = dat))
mem_change(fit4 <- survfit(formula = Surv(t, e) ~ x2, data = dat))
mem_change(fit5 <- survfit(formula = Surv(t, e) ~ x + x2, data = dat))
map(list(fit, fit2, fit3, fit4, fit5), object_size)
object_size(fit, fit2, fit3, fit4, fit5)
In case of fit
and fit5
, pryr::mem_change()
will report a change of ~ 7.5 MB, while each fitX
object has 6.4 MB, as reported by pryr::object_size()
.
Are there any hidden variables created elsewhere, or is it somehow related to C implementation under the hood of survfit
?
Edit: I'm aware, that the actual modelling process may consume more memory temporarily. However, pryr::mem_change()
is assumed to return the net change in used memory, after all the computations have been finished, and all temporary objects have been discarded.