2

I am using the survfit function in the R package survival to create survival curves from a survfit.coxph object output by coxph. I have two methods for creating the curve which give different results. I believe the first is the correct answer, but I can't tell why method 2 does not work.

library(survival)
set.seed(1234)

## generate small data set
n <- 10
z <- rnorm(n,mean=0.4)
x <- rexp(n,exp(z))
y <- pmin(1,x)
del <- 1*(x < 1)
dat <- data.frame(y,del,z)

## fit cox model
fit <- coxph(Surv(y,del)~z,ties="breslow",data=dat)

## method 1
newdata <- dat[1,]
newdata[1,3] <- 0
out <- survfit(fit,newdata=newdata)
out$surv
##[1] 0.9557533 0.9048870 0.8545721 0.7599743 0.6397022 0.4218647 0.4218647


## method 2, why not same as method 1?
dat[1,3] <- 0
out <- survfit(fit,newdata=dat[1,])
out$surv
##[1] 0.9570757 0.9079589 0.8593546 0.7710287 0.6610956 0.4787354 0.4787354
jpl2116
  • 55
  • 4
  • That is a puzzle. (I get the same with R 3.5.2 and survival_2.43-3). I get TRUE for `identical(newdata, dat[1,])` and adding drop=FALSE as a third argument to "[" did not affect the results. – IRTFM Mar 11 '19 at 18:29

1 Answers1

2

In both methods survfit function receives two parameters: fit and newdata.

In the method 1 line newdata[1,3] <- 0 changes only the object newdata and the object dat and consequently object fit are not changed.

In the method 2, instead, dat[1,3] <- 0 changes both, the object newdata and the object fit.

So the newdata objects received by survfit function are identical in both methods, as 42 correctly pointed out, but the fit objects are not.
If you make 3 identical dataframes in the beginning, you can see this.

dat1 <- data.frame(y,del,z)
dat2 <- data.frame(y,del,z)
dat3 <- data.frame(y,del,z)

## fit cox model
fit <- coxph(Surv(y,del)~z,ties="breslow",data=dat1)

## method 1
newdata <- dat2[1,]
newdata[1,3] <- 0

out <- survfit(fit,newdata=newdata)
out$surv
##[1] 0.9557533 0.9048870 0.8545721 0.7599743 0.6397022 0.4218647 0.4218647


## method 2, same as method 1
dat3[1,3] <- 0
out <- survfit(fit,newdata=dat3[1,])
out$surv
##[1] 0.9557533 0.9048870 0.8545721 0.7599743 0.6397022 0.4218647 0.4218647
Oka
  • 1,318
  • 6
  • 11
  • Wow. I really amazed at that. I suspect it's not really supposed to be that way. Violates the usual functional paradigm to my way of thinking. – IRTFM Mar 11 '19 at 20:20
  • Thanks. I am wondering if this is a bug or me not using the function correctly. – jpl2116 Mar 12 '19 at 02:07
  • Well, this, inheritance, mutating, lazy evaluation etc - it just behaves in certain ways. - sometimes you can use it, sometimes it can cause troubles if one is not aware.. – Oka Mar 12 '19 at 02:18
  • ...and I don´t think that this is in any way specific for survfit function - I think you´ll come across it elsewhere as well.. – Oka Mar 12 '19 at 02:23
  • And please, let me know if answer was not clear/detailed enough or if there are any follow-up questions.. – Oka Mar 12 '19 at 02:28
  • 1
    @Oka But apparently your hint was not specific enough. That's my upvote. – IRTFM Mar 24 '19 at 03:11