4

I'm trying to create an actuarial survival analysis in R (I'm following some worked examples). I think the best way to do this is using the survival package. So something like:

library(survival)
surv.test <- survfit(Surv(TIME,STATUS), data=test)

However, to get the correct answer I will need to divide the TIME variable into 365 day intervals and I can't quite work out how to do this so it matches the given result.

As far as I can make out, there is no option within the survfit function that will do this. I went through several document examples and none of them were trying to create a stairstep type of plot (there is a type='interval' option, but seems to do something different). So I guess I need to regroup my data before I apply the survival function?

Any ideas?

P.S: In SPSS this would be INTERVAL = THRU 10000 BY 365; in Stata intervals(365) ... connect(stairsteps)

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Joanne Demmler
  • 1,406
  • 11
  • 31
  • Not sure why this question was downvoted. +1 as I think this isn't an unreasonable question. How are the TIME and STATUS variables set up right now? – TARehman Aug 09 '12 at 14:32
  • What would help is some kind of example that could be run and you telling us why the example is wrong and what we should change. – BlueTrin Aug 09 '12 at 15:45
  • Who asked if I worked at BNPP? The answer is no, incidentally. – TARehman Aug 09 '12 at 16:07
  • Why do you need to divide the `TIME` variable into intervals? Are you trying to plot the Kaplan-Meier curve (sometimes called a stairstep plot)? Or are you trying to add time-varying covariates to your model? It makes a big difference. – nograpes Aug 09 '12 at 17:19
  • The table contains roughly 9,400 records of events and the data ought to be grouped in years (according to the exercise). The STATUS variable is a simple 0,1 variable to depict if the person had a certain operation, while the TIME variable is days since another operation. I'm sorry, but I can't post images yet. Imagine that in R there is a thick flat line depicted instead of wide steps that can be seen in the SPSS and STATA output. I think the problem is that the table contains daily events and the steps are too small (TIME runs from 0 to 6249). – Joanne Demmler Aug 10 '12 at 07:08

3 Answers3

1

I am guessing that you want to divide the TIME variable into intervals because you want to plot a Kaplan-Meier curve. In R, that isn't necessary, you can just call plot on the survfit object. For example,

s=survfit(Surv(futime, fustat)~rx, data=ovarian)
plot(s)

enter image description here


I think I understand your question a little better. The reason why you are getting a thick black line is because you have a lot of censoring, and a + is being plotted at every single point where there is censoring, you can turn this off with mark.time=F. (You can see other options in ?survival:::plot.survfit)

However, if you still want to aggregate by year, simply divide your follow up time by 365, and round up. ceiling is used to round up. Here is an example of aggregating at different time levels without censoring.

par(mfrow=c(1,3))
plot(survfit(Surv(ceiling(futime), fustat)~rx, data=ovarian),col=c('blue','red'),main='Day',mark.time=F)
plot(survfit(Surv(ceiling(futime/30), fustat)~rx, data=ovarian),col=c('blue','red'),main='Month',mark.time=F)
plot(survfit(Surv(ceiling(futime/365), fustat)~rx, data=ovarian),col=c('blue','red'),main='Year',mark.time=F)
par(mfrow=c(1,1))

But I think that plotting the Kaplan-Meier without the censoring symbols will look very nice, and provide more insight.

enter image description here

nograpes
  • 18,623
  • 1
  • 44
  • 67
  • As my TIME variable is in days I only get a thick black line as output when using the standard method. The data needs to be plotted as year intervals, otherwise the steps won't be visible. – Joanne Demmler Aug 10 '12 at 07:17
0

Hurray, I should be able to post the images now:

1) this is how the R basic survival plot looks like at the moment enter image description here

2) and this is how it should look like (SPSS example) enter image description here

Joanne Demmler
  • 1,406
  • 11
  • 31
0

That was exactly what I was missing! Thanks!

enter image description here

Solution:

vas.surv <- survfit(Surv(ceiling(TIME/365), STATUS)~1, conf.type="none", data=vasectomy)
plot(vas.surv, ylim=c(0.975,1), mark.time=F, xlab="Years", ylab="Cumulative Survival")

A nice touch would be to displays the days on the x-axis instead of the years (as in SPSS) example, but I'm not too bothered about this.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
Joanne Demmler
  • 1,406
  • 11
  • 31
  • Quite a few questions here on SO touch labeling axes with time components. Feel free to browse. – Roman Luštrik Aug 10 '12 at 13:26
  • Bit nasty this one as it is not just relabelling (e.g. suppressing the plot of the x-axis with xaxt='n' and then reassigning labels with the axis command) of the axis, but the tick marks placement will change as well. – Joanne Demmler Aug 10 '12 at 13:34
  • Specifically, take a look at `?survival:::plot.survfit` to find out more about this plot command. – nograpes Aug 10 '12 at 15:23
  • 1
    But all you would have to do to rescale it is use the `xscale` parameter. So for years to days you would scale it by `xscale=1/365` – nograpes Aug 10 '12 at 15:25
  • Brilliant! `?survival:::plot.survfit` is a good command to keep! – Joanne Demmler Aug 13 '12 at 10:43