2

I need to calculate the linear predictor of a Cox PH model manually.

I can get continuous and binary variables to match the output of predict.coxph (specifying 'lp') but I can't seem to figure out how to calculate it for categorical variables with more than 2 levels.

My aim is to assess calibration of a published model in my own data-I only have coefficients so need to be able to do this by hand.

This post on stackoverflow describes how to calculate for continuous variables...

(Coxph predictions don't match the coefficients)

Any advice would be appreciated! Thanks

R example...

URL   <- "http://socserv.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
Rossi <- read.table(URL, header=TRUE)                  
summary(Rossi[,c("week", "arrest", "fin")])
 #      week           arrest        fin     
 # Min.   : 1.00   Min.   :0.0000   no :216  
 # 1st Qu.:50.00   1st Qu.:0.0000   yes:216  
 # Median :52.00   Median :0.0000            
 # Mean   :45.85   Mean   :0.2639            
 # 3rd Qu.:52.00   3rd Qu.:1.0000            
 # Max.   :52.00   Max.   :1.0000 

library(survival)     

#for binary variable
fitCPH <- coxph(Surv(week, arrest) ~ fin, data=Rossi)    #Cox-PH model
(coefCPH <- coef(fitCPH))                               # estimated coefficients
#   finyes 
#-0.3690692 

head(predict(fitCPH,type="lp"))
#[1]  0.1845346  0.1845346  0.1845346 -0.1845346  0.1845346  0.1845346

head(((as.numeric(Rossi$fin) - 1) - mean(as.numeric(Rossi$fin) - 1))    * coef(fitCPH))
#[1]  0.1845346  0.1845346  0.1845346 -0.1845346  0.1845346  0.1845346

#for categorical variable
set.seed(170981)
Rossi$categorical.example <-    as.factor(sample(1:3,nrow(Rossi),replace = TRUE))
summary(Rossi[,c("week", "arrest", "categorical.example")])
 #      week           arrest       categorical.example
 # Min.   : 1.00   Min.   :0.0000   1:156              
 # 1st Qu.:50.00   1st Qu.:0.0000   2:138              
 # Median :52.00   Median :0.0000   3:138              
 # Mean   :45.85   Mean   :0.2639                      
 # 3rd Qu.:52.00   3rd Qu.:1.0000                      
 # Max.   :52.00   Max.   :1.0000               

fitCPH2 <- coxph(Surv(week, arrest) ~ categorical.example, data=Rossi)    #Cox-PH model
(coefCPH2 <- coef(fitCPH2))  
#categorical.example2 categorical.example3 
#           -0.181790            -0.103019 

head(predict(fitCPH2,type="lp"))
#[1]  0.09098066 -0.01203832 -0.01203832  0.09098066 -0.09080938 -0.09080938

#How to calculate manually??
Michelle
  • 21
  • 2

1 Answers1

0

Think I've figure this out if anyone is interested...

#How to calculate manually??
dv1 <- as.numeric(Rossi$categorical.example)-1    #make 0,1,2 rather than 1,2,3
dv1[dv1==2] <- 0

dv2 <- as.numeric(Rossi$categorical.example)-2    #make -1,0,1 rather than 1,2,3
dv2[dv2==-1] <- 0

meandv1  <- mean(dv1)   
meandv2  <- mean(dv2) 

head(((dv1-meandv1)*coefCPH2 [1])+((dv2-meandv2)*coefCPH2 [2]))
#[1]  0.09098066 -0.01203832 -0.01203832  0.09098066 -0.09080938 -0.09080938

all(round(predict(fitCPH2,type="lp"),5)==round(((dv1-meandv1)*coefCPH2 [1])+((dv2-meandv2)*coefCPH2 [2]),5))
#[1] TRUE
Michelle
  • 21
  • 2
  • You would give your readers a better understanding if you explained that factors are code as integers ascending from 1 and that the 1 coded level is considered the reference level when using treatment contrasts. Changing the contrast options will alter the model matrix and lead to different coefficients. – IRTFM Jan 14 '23 at 09:19