2

I am a newbie to R and I am trying to perform a logistic regression on a set of clinical data. My independent variable is AGE, TEMP, WBC, NLR, CRP, PCT, ESR, IL6, and TIME. My dependent variable is binomial CRKP.

After using glm.fit, I was given this error message:

glm.fit <- glm(CRKP ~ AGE + TEMP + WBC + NLR + CRP + PCT + ESR, data = cv, family = binomial, subset=train)

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

I searched up potential problems and used the corrplot function to see if there is multicollinearity that could potentially result in overfitting.

This is what I have as the plot. Correlation Plot

Correlation plot shows that my ESR and PCT variable are highly correlated. Similarly, CRP and IL6 are highly correlated. But they are all important clinical indicators I would like to include in the model.

I tried to use the VIF to selectively discard variables, but wouldn't that be biased and also I would have to sacrifice some of my variables of interest.

Does anyone know what I can do? Please help. Thank you!

alistaire
  • 42,459
  • 4
  • 77
  • 117
  • How many observations does this mode utilise? Binomial GLM can have a perfect fit with only a handful of predictors depending on your number of observations – VitaminB16 May 31 '21 at 21:07
  • I have 51 observations in total. – Taicheng Jin Jun 01 '21 at 22:07
  • See these: https://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression, https://stats.stackexchange.com/questions/336424/issue-with-complete-separation-in-logistic-regression-in-r – VitaminB16 Jun 01 '21 at 23:09

1 Answers1

1

You have a multicollinearity problem but don't want to drop variables. In this case you can use Partial Least Squares (PLS) or Principal Component Regression (PCR).