1

I have questions about multivariable cox regression analysis including non-binary categorical variables. My data consists of several variables, and some of them are binary (like sex, and age over 70, etc..) whereas the rest of them are not (for example, ECOG)

I tried both analyse_multivariate function and coxph function, but it seems that I can only get overall hazard ratios regarding non-categorical variables, but I'd like to know both overall hazard ratios for the variable and individual hazard ratios for the subcategories in the variable (like hazard ratios for ECOG 0, ECOG 1, ECOG 2, and for overall ECOG)

What I tried in the process is like this:

(1)

ECOG = as.factor(df$ECOG)
analyse_multivariate(data=df, 
                     time_status = vars(df$OS, df$survival_status==1),
                     covariates = vars(df$age70, df$sex, ECOG),
                     reference_level_dict = c(ECOG==0))

and the result is like this:

Hazard Ratios:
factor.id      factor.name factor.value    HR Lower_CI Upper_CI Inv_HR Inv_Lower_CI Inv_Upper_CI
df$age70         df$age70 <continuous>  1.07     0.82     1.41   0.93         0.71         1.22
ECOG:4 ECOG            4  1.13     0.16     8.19   0.89         0.12         6.43
df$sex           df$sex <continuous>    1.87     0.96     3.66   0.53         0.27         1.04
ECOG:1 ECOG            1  2.14     1.63     2.81   0.47         0.36         0.61
ECOG:3 ECOG            3 12.12     7.83    18.76   0.08         0.05         0.13
ECOG:2 ECOG            2 13.72     4.92    38.26   0.07         0.03          0.2

(2)

analyse_multivariate(data=df, 
                     time_status = vars(df$OS, df$survival_status==1),
                     covariates = vars(df$age70, df$sex, df$ECOG),
                     reference_level_dict = c(ECOG==0))

and the result is:


Hazard Ratios:
factor.id   factor.name factor.value   HR Lower_CI Upper_CI Inv_HR Inv_Lower_CI Inv_Upper_CI
df$age70   df$age70 <continuous> 0.89     0.68     1.16   1.13         0.86         1.47
df$sex     df$sex <continuous> 1.87     0.96     3.65   0.53         0.27         1.04
df$ECOG    df$ECOG <continuous>  1.9     1.69     2.15   0.53         0.47         0.59

Does it make sense if I use a p-value for ECOG in total from (2) and consider ECOG as a significant variable if its p-value is <0.05, and combine individual hazard ratios for individual ECOG status from (1)?

like for generating a table like followings:

                  p-value   0.01
ECOG 1   Reference  
ECOG 2   13.72 (4.92-38.26) 
ECOG 3   12.12 (7.83-18.76) 
ECOG 4   1.13 (0.16-8.19)   

I believe there are better solutions but couldn't find one.

Any comments would be appreciated! Thank you in advance.

1 Answers1

0

Short answer is no. In (2), it is a continuous response, meaning you expect the log odds ratio of survival to have a linear relationship with ECOG, whereas in (1) you expect every level (1 to 4) to have a different effect on survival. To test the variable ECOG collective, you can do an anova:

library(survivalAnalysis)
data = survival::lung
data$ECOG = factor(data$ph.ecog)
data$sex = factor(data$sex)

fit1 = data %>%
  analyse_multivariate(vars(time, status),
                       covariates = vars(age, sex, ECOG, wt.loss))

anova(fit1$coxph)
Analysis of Deviance Table
 Cox model: response is Surv(time, status)
Terms added sequentially (first to last)

         loglik   Chisq Df Pr(>|Chi|)   
NULL    -675.02                         
age     -672.36  5.3325  1   0.020931 * 
sex     -667.82  9.0851  1   0.002577 **
ECOG    -660.26 15.1127  3   0.001723 **
wt.loss -659.31  1.9036  1   0.167680   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • I agree with Stupid. Testing ECOG overall makes little sense since this variable is NOT a continuous one. Treating the overall effect means you are assuming it's at best an ordinal variable with equal differences between the values, and that's pushing the bounds of reason. No oncologist in their right mind would agree with that assumption. – Edward Mar 14 '20 at 12:49
  • StupidWolf and Edward, thank you for your comments! As Edward pointed out, my original variables did not include ECOG like the example above but I modified the names of my variables just for the question - but it seems that I only made you confused. I strongly agree with your comment! Thank you. – user11602587 Mar 14 '20 at 14:14