1

I am using survival data to fit a flexible survival model (example below). I can capture the output of the fitted model as *.txt file and save it in my directory. My question is how can I read back the output into an object in r and use it to plot the fitted model and do predictions (without the need for the data to re-run the model again).

The reason I want this is because I am working on a server and can only export output rather than the data and want to use the fitted model in a shinyapp on my personal computer.

library(data.table)
library(flexsurv)
#load some data
data(pbc, package="randomForestSRC")
data <- as.data.table(na.omit(pbc))
data[, years := days/365.25]

fit <- flexsurvspline(Surv(years, status) ~ albumin, data=data, k=1, scale='hazard')
fit
capture.output(fit, file='fit.txt')

ndata <- data.frame(albumin = 3)

#plot
plot(fit, est=FALSE, ci=FALSE, col.obs = 'white')
lines(fit, newdata = ndata, ci=FALSE, col = 'green')

#predict
prob=summary(fit,type='survival',newdata=ndata, t=12)
prob

#how can I read the model output saved in .txt file and use  

#read .txt model output
fit2 <- read #???
fit2
plot(fit2, est=FALSE, ci=FALSE, col.obs = 'white')
lines(fit2, newdata = ndata, ci=FALSE, col = 'green')

prob=summary(fit2,type='survival',newdata=ndata, t=12)
prob 
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Amer
  • 2,131
  • 3
  • 23
  • 38
  • You can't recreate the object from just the text output. If you want to save a model for later you can save it with something like `saveRDS(fit, "fit.rds")` and retrieve it later with `fit <- readRDS("fit.rds")` – MrFlick Aug 25 '21 at 03:32
  • I was about to say the same thing @MrFlick but after reading the model back in with `readRDS()` I found `identical(fit, fit2, ignore.environment = TRUE)` was FALSE... Any thoughts why the fit objects would be different? – jared_mamrot Aug 25 '21 at 04:33
  • @jared_mamrot. It's not clear why they are different. The differences seem to be in the `dfns` items which are functions. So maybe the ignore.environment option isn't ignoring the function environment for some reason. But comparing `identical(fit1$dfns$p, fit2$dfns$p)` returns FALSE when they seem to be the same. – MrFlick Aug 25 '21 at 04:45
  • @MrFlick the median estimate is identical but not the confidence interval limits. I am not sure why. but it is important to have these identical as well. – Amer Aug 25 '21 at 04:53
  • If it's not identical then it sounds like a bug with the package. Perhaps you could file a [github issue](https://github.com/chjackson/flexsurv-dev/issues) – MrFlick Aug 25 '21 at 04:56

1 Answers1

2

An example of the method @MrFlick recommended:

library(data.table)
#install.packages("flexsurv")
library(flexsurv)
#> Loading required package: survival

#load some data
data(pbc, package="randomForestSRC")
data <- as.data.table(na.omit(pbc))
data[, years := days/365.25]

set.seed(123)
fit <- flexsurvspline(Surv(years, status) ~ albumin, data=data, k=1, scale='hazard')
saveRDS(fit, "fit_object.rds")

ndata <- data.frame(albumin = 3)

#plot
plot(fit, est=FALSE, ci=FALSE, col.obs = 'white')
lines(fit, newdata = ndata, ci=FALSE, col = 'green')

#predict
set.seed(234)
prob=summary(fit,type='survival',newdata=ndata, t=12)
prob
#> albumin=3 
#>   time        est        lcl       ucl
#> 1   12 0.06864802 0.02431122 0.1567719

fit2 <- readRDS(file = "fit_object.rds")
fit2
#> Call:
#> flexsurvspline(formula = Surv(years, status) ~ albumin, data = data, 
#>     k = 1, scale = "hazard")
#> 
#> Estimates: 
#>          data mean  est      L95%     U95%     se       exp(est)  L95%   
#> gamma0        NA     2.7412   1.2590   4.2235   0.7563       NA        NA
#> gamma1        NA     0.9900   0.5100   1.4701   0.2449       NA        NA
#> gamma2        NA    -0.0342  -0.0869   0.0186   0.0269       NA        NA
#> albumin   3.5168    -1.6959  -2.1481  -1.2436   0.2307   0.1834    0.1167
#>          U95%   
#> gamma0        NA
#> gamma1        NA
#> gamma2        NA
#> albumin   0.2883
#> 
#> N = 276,  Events: 111,  Censored: 165
#> Total time at risk: 1495.551
#> Log-likelihood = -374.4158, df = 4
#> AIC = 756.8316

identical(fit, fit2, ignore.environment = TRUE)
#> [1] FALSE

plot(fit2, est=FALSE, ci=FALSE, col.obs = 'white')
lines(fit2, newdata = ndata, ci=FALSE, col = 'green')

set.seed(234)
prob=summary(fit2,type='survival',newdata=ndata, t=12)
prob
#> albumin=3 
#>   time        est        lcl       ucl
#> 1   12 0.06864802 0.02431122 0.1567719

Created on 2021-08-25 by the reprex package (v2.0.0)

Update: if you use the same seed (via set.seed()) you get the same est/lcl/ucl for "fit" and "fit2". Also, based on MrFlick's comment, even though the objects aren't identical I believe getting the same output using the same seed shows that they are functionally the same (i.e. 'good enough'). You could test this further with your actual data and compare outcomes on the server vs locally.

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46