Cox Hazard Model: why multiple factor levels are reference?

Question

When conducting a multivariate Coxph model with categorical predictors, and after releveling all predictor variables (function= relevel()), two of my predictor variables have multiple levels that are "references", eliminating several analyses. Specifically, for the "condition" and "pathogenpres" factors (below). Is this a common problem with a known solution? Do I have too many predictors (n=6)?

Code below:

mydata<-read.csv("survivaldata.csv")

mydata$concentration1 <- relevel(mydata$concentration1, "1")#
mydata$concentration2 <- relevel(mydata$concentration2, "1")#
mydata$contaminant <- relevel(mydata$contaminant, "Control")#
mydata$pathogenpres <- relevel(mydata$pathogenpres, "2")#
mydata$fam <- relevel(mydata$fam, "1")#
mydata$condition <- relevel(mydata$condition, "gg")#

surob1<-Surv(time=mydata$days.surv,event = mydata$censored)

fit.coxph1<-coxph(surob1~concentration1+concentration2+contaminant+pathogenpres+condition+fam,data=mydata,conf.type="plain")

> summary(fit.coxph1)
Call:
coxph(surob1~concentration1+concentration2+contaminant+pathogenpres+condition+fam,data=mydata)

  n= 188, number of events= 83 
   (8 observations deleted due to missingness)

                 coef exp(coef) se(coef)      z Pr(>|z|)  
concentration12  0.62577   1.86968  0.35103  1.783   0.0746 .
concentration13  0.69556   2.00483  0.35593  1.954   0.0507 .
concentration22  -0.15399   0.85728  0.31970 -0.482   0.6300  
concentration23  -0.26729   0.76545  0.31970 -0.836   0.4031  
contaminant1  0.74756   2.11185  0.69261  1.079   0.2804  
contaminant2  0.40921   1.50563  0.69438  0.589   0.5556  
condition1        NA        NA  0.00000     NA       NA  
condition2        NA        NA  0.00000     NA       NA  
condition3        NA        NA  0.00000     NA       NA  
condition4   0.53134   1.70120  0.76529  0.694   0.4875  
pathogenpres1            NA        NA  0.00000     NA       NA  
fam2       0.07799   1.08112  0.26550  0.294   0.7689  
fam3      -0.03410   0.96647  0.28008 -0.122   0.9031  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Without a better description of the data we would only be offering WAGs (wild-ass guesses.) — IRTFM, Apr 12 '20 at 01:34
@42- The explanatory variables are categorical factors that range in 2-5 levels. Condition is a treatment condition, which categorizes each subject based on concentration (12, 13, 22, or 23). Concentration is the concentration of a Pathogenpres1 is a pathogen present (=1), absence =0. Contaminant1, contaminant2 are types of contaminants (subject exposed to 1 of 3). Fam is the clutch each subject came from (1 of 3). — SpencerS, Apr 13 '20 at 13:08
I’m guessing that the NAs are due to those levels being completely predictable from other variable. Complete colinearity is recognized by the regression machinery and redundant variables are given NA coefficients. — IRTFM, Apr 13 '20 at 15:04

Cox Hazard Model: why multiple factor levels are reference?

0 Answers0