2

I'm trying to make some survival analysis with an ordered categorical variable using the test for trend option in ggsurvplot function from survminer package. The p value obtained is different if my variable is numeric (1,2,3,4) or an ordered factor ("1","2","3","4") when computing log rank test for trend. Which format should I used and why?

I also used the comp function from survMisc package because it seems that ggsurvplot is based on this package and obtained as expected the same results.

Thank you very much for your help.

This is a code to reproduce the issue:

library(survminer)
data("larynx", package="KMsurv")


larynx.table=larynx
larynx.table$stage.cat=as.character(larynx.table$stage)
larynx.table$stage.fact=factor(larynx.table$stage.cat,levels=c("1","2","3","4"))
larynx.table$stage.fact.inv=factor(larynx.table$stage.cat,levels=c("4","3","2","1"))

fit1<- survfit(Surv(time, delta) ~ stage, data = larynx.table)
fit2<- survfit(Surv(time, delta) ~ stage.cat, data = larynx.table)
fit3<- survfit(Surv(time, delta) ~ stage.fact, data = larynx.table)
fit4<- survfit(Surv(time, delta) ~ stage.fact.inv, data = larynx.table)

SA=ggsurvplot(fit1, data = larynx.table, pval = TRUE,test.for.trend = TRUE)
SB=ggsurvplot(fit2, data = larynx.table, pval = TRUE,test.for.trend = TRUE)
SD=ggsurvplot(fit3, data = larynx.table, pval = TRUE,test.for.trend = TRUE)
SE=ggsurvplot(fit4, data = larynx.table, pval = TRUE,test.for.trend = TRUE)

arrange_ggsurvplots(list(SA,SB,SD,SE),ncol=2,nrow=2)

Survival plots with numeric or categorical variables

  • This is probably better on Stackexchange. But the short answer is that a numeric variable coded as you have described assumes a linear relationship between dependent and predictor ( (and is associated with 1 degree of freedom) whereas a character/factor does not (and is associated with n-1 df, where n is the number of levels in the factor). – Limey May 21 '21 at 16:33
  • Thank you Limey. When I make a cox test I expect to have differences between numeric or categorical predictors for the reasons you explained. But when you make a generalization of a log rank test with more than 2 groups you obtain the same p value however the variable is coded, so I expected the same behavior for log rank test for trend. In the example I gave from the survmisc package, the predictor is the stage of a disease, that is an ordinal categorical variable not a linear numeric. Is it wrong to use it as numeric as they did (fit1)? – Kirilovsky Amos May 25 '21 at 13:53

0 Answers0