Survival statistics in R

Question

I have 100 repeats in 15 independent categories, each individual's survival is recorded at 5 different stages.

An example of the data is as follows:

Category           Stage of death
1                        3
1                        2
1                        3
1                        3
2                        1
2                        1
2                        1
2                        1
3                        5
3                        5
3                        4
3                        4
4                        3
4                        ..........etc

I also have positive and negative controls and would like to compare the groups to see which groups have a significantly low survival rate.

Any recommended statistical analysis or R packages capable of analysing this data would be received gladly.

try to give a reproducible example in order the people to help you. — denis, Jan 16 '18 at 14:15
The response by denis below is very informative, but in general you will probably get more useful responses from the r-help user forum, which is very active. A web search should find r-help. — Robert Dodier, Jan 16 '18 at 17:51

denis · Answer 1 · 2018-01-16T14:32:06.840

The basics of survival statistics in R is the library survival. Here is an example of the basics for what you want to do: compare two survival curves (Kaplan Meyer curves) using a Cox regression or a log-rank test.

library(survival)
library(Hmisc)
dummyex <- data.table(treatment_duration = sample(c(1:10), 50, replace = T), 
                     stopany = sample(c(0,1),50,replace = T), 
                     ID = 1:50) 
dummyex[,seropositive := sample(c(0,1),1),by = ID]

Here the variable seropositive gives me two different curves: the one for seropositive = 1, and the one for seropositive = 0. It is the equivalent of your category. The stopany variable here is the variable that says if the event you are studying happend or not. It depends if in your data there are lost to follow up or not. If you do:

stopanydummy <- survfit(Surv(treatment_duration,stopany)~seropositive,data= dummyex)

It will construct the two survival plot for the 2 values of seropositive.

plot(stopanydummy[1], col = 1,main = "survival plot", xlab = "time",ylab = "proportion of patient still alive")
lines(stopanydummy[2],col = 2)

If you want to compare different survival curves for which you don't have lost to follow up, you will use the log-rank test implemented in the function survdiff. If you have lost to follow up, you will use a Cox regression to get hazard ratio:

coxfitsimple <- coxph(Surv(treatment_duration,stopany) ~ seropositive, data=dummyex) 
Call:
coxph(formula = Surv(treatment_duration, stopany) ~ seropositive, 
    data = dummyex)

  n= 50, number of events= 28 

                coef exp(coef) se(coef)      z Pr(>|z|)
seropositive -0.3750    0.6873   0.4074 -0.921    0.357

             exp(coef) exp(-coef) lower .95 upper .95
seropositive    0.6873      1.455    0.3093     1.527

Concordance= 0.567  (se = 0.056 )
Rsquare= 0.017   (max possible= 0.976 )
Likelihood ratio test= 0.88  on 1 df,   p=0.348
Wald test            = 0.85  on 1 df,   p=0.3573
Score (logrank) test = 0.86  on 1 df,   p=0.3546

Here saying that seropositive = 1 has a protective action (the hazard ratio = 0.69), but with a p value that is low (no statistical difference between the two curves)

Survival statistics in R

1 Answers1