1

I'm trying to write a set of functions where the first function fits a cox model (via coxph in the survival package in R), and the second function gets estimated survival for a new dataset, given the fitted model object from the first function. I'm running into some sort of scoping issue that I don't quite know how to solve without substantially re-factoring my code (the only way I could think to do it would be much less general and much harder to read).

I have a very similar set of functions that are based on the glm function that do not run into the same issue and give me the answers I would expect. I've included a short worked example below that demonstrates the issue. The glue.cox and glue.glm are functions that have the basic functionality I am trying to get. glue.glm works as expected (yielding the same values from a calculation in the global environment), but the glue.cox complains that it can't find the data that was used to fit the cox model and ends with an error. I don't understand how to do this with substitute but I suspect that is the way forward. I've hit a wall with experimenting.

 library(survival)

 data.global = data.frame(time=runif(20), x=runif(20))
 newdata.global = data.frame(x=c(0,1))
 f1 = Surv(time) ~ x # this is the part that messes it up!!!!! Surv gets eval
 f2 = time ~ x # this is the part that messes it up!!!!! Surv gets eval
 myfit.cox.global = coxph(f1, data=data.global)
 myfit.glm.global = glm(f2, data=data.global)
 myfit.glm.global2 = glm(time ~ x, data=data.global)


 myfit.cox <- function(f, dat.local){
   coxph(f, data=dat.local)
 }

 myfit.glm <- function(f, dat.local){
   glm(f, data=dat.local)
 }

 mypredict.cox <- function(ft, dat.local){
   newdata = data.frame(x=c(0,1))
   tail(survfit(ft, newdata)$surv, 1)
 }

 mypredict.glm <- function(ft, dat.local){
   newdata = data.frame(x=c(0,1))
   predict(ft, newdata)
 }

 glue.cox <- function(f, dat.local){
   fit = myfit.cox(f, dat.local)
   mypredict.cox(fit, dat.local)
 }

 glue.glm <- function(f, dat.local){
   fit = myfit.glm(f, dat.local)
   mypredict.glm(fit, dat.local)
 }

 # these numbers are the goal for non-survival data
 predict(myfit.glm.global, newdata = newdata.global)               

0.5950440 0.4542248

 glue.glm(f2, data.global) 

0.5950440 0.4542248 # this works

 # these numbers are the goal for survival data
 tail(survfit(myfit.cox.global, newdata = newdata.global)$surv, 1)

[20,] 0.02300798 0.03106081

 glue.cox(f1, data.global) 

Error in eval(predvars, data, env) : object 'dat.local' not found

alex keil
  • 1,001
  • 7
  • 14

1 Answers1

1

This appears to work, at least in the narrow sense of making glue.cox() work as desired:

myfit.cox <- function(f, dat.local){
    environment(f) <- list2env(list(dat.local=dat.local))
    coxph(f, data=dat.local)
}

The trick here is that most R modeling/model-processing functions look for data in the environment associated with the formula.

I don't know why glue.glm works without doing more digging, except for the general statement that [g]lm objects store more of the information needed for downstream processing internally (e.g. in the $qr element) than other model types.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Beautiful - it fixed the working example as well as the actual example. Thank you! It looks like this might have been the reason for the `cph` and `survift.cph` functions in the `rms` package. The explanation in the documentation is similar to your thoughts about why it worked with `[g]lm` (the `coxph` fit objects don't contain the original data, but the `cph` objects do) - I ran into other problems with that package, though. – alex keil Nov 10 '19 at 22:18