1

I am trying to run a Cox proportional hazard model to determine the effect of treatment and covariates on the survival of individual plant species. Previously when I ran coxph with only the treatment (categorical/factor)

simacox <- coxph(Surv(Time, Event, type = c('right')) ~ Treatment,  data = rsima)

It ran fine, but when I added in (continuous) covariates I kept getting an error message:

simacox <- coxph(Surv(Time, Event, type = c('right')) ~ 
    Treatment+SLA+VLA+Thickness+Growth_Rate,  data = rsima)

Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge

Here is the data set: I'm not really sure is if it is caused by NA values or from another issue. I have looked into similar problems, but they generally arise because the Treatment is continuous and seems to be a different issue.

Plot ID Subplot Treatment   Column  Row Species Time    Event   Growth_Rate Area    SLA VLA Thickness
PC1 1   control A   7   SIMA    535 1   0.0132  NA  NA  NA  NA
PC1 2   control C   2   SIMA    829 0   0.0532  6   123.5312982 1.307927088 0.1005
PC1 3   control D   2   SIMA    535 1   0.0329  NA  NA  NA  NA
PC2 1   control A   7   SIMA    829 0   0.0236  0.75    192.6132404 1.49602026  0.135
PC2 2   control C   2   SIMA    829 1   0.0037  NA  NA  NA  NA
PC2 3   control D   2   SIMA    535 1   0.0099  NA  NA  NA  NA
PC3 1   control A   7   SIMA    152 1   0.0163  NA  NA  NA  NA
PC3 2   control C   2   SIMA    829 0   0.058   1   185.3606789 1.311713087 0.135
PC3 3   control D   2   SIMA    829 0   0.0097  0.75    96.12967467 1.392643765 0.1735
PC4 1   control A   7   SIMA    152 1   0.0109  NA  NA  NA  NA
PC4 2   control C   2   SIMA    120 1   0.0109  NA  NA  NA  NA
PC4 3   control D   2   SIMA    120 1   0.0217  NA  NA  NA  NA
PC5 1   control A   7   SIMA    92  1   0   NA  NA  NA  NA
PC5 2   control C   2   SIMA    152 1   0.0109  NA  NA  NA  NA
PC5 3   control D   2   SIMA    829 1   0.0009  NA  NA  NA  NA
PS1 1   shelter A   7   SIMA    829 0   0.0121  3.25    96.12967467 1.392643765 0.1735
PS1 2   shelter C   2   SIMA    829 1   0.0009  NA  NA  NA  NA
PS1 3   shelter D   2   SIMA    829 0   0.0435  11.75   119.0672131 1.26393576  0.2495
PS2 1   shelter A   7   SIMA    829 0   0.0508  6   128.8442116 1.744927272 0.1417
PS2 2   shelter C   2   SIMA    829 0   0.0193  1   163.722709  1.987793669 0.1045
PS2 3   shelter D   2   SIMA    829 0   0.0484  6.5 134.4099228 1.589451631 0.18
PS3 1   shelter A   7   SIMA    829 0   0.0363  9.5 184.2795579 1.450538059 0.1035
PS3 2   shelter C   2   SIMA    829 0   0.058   11  96.76593176 1.501929992 0.08
PS3 3   shelter D   2   SIMA    829 0   0.0193  2.25    124.317571  3.516426012 0.1295
PS4 1   shelter A   7   SIMA    829 0   0.0411  4.5 113.088867  2.203327018 0.149
PS4 2   shelter C   2   SIMA    535 1   0.0263  NA  NA  NA  NA
PS4 3   shelter D   2   SIMA    829 0   0.058   11  31.44098888 1.714225616 0.1595
PS5 1   shelter A   7   SIMA    829 0   0.0363  11.5    155.3209302 1.308096836 0.23875
PS5 2   shelter C   2   SIMA    829 0   0.0048  0.25    171.0465116 2.135961931 0.104
PS5 3   shelter D   2   SIMA    829 0   0.0266  5   178.9407945 1.599492384 0.0975
PW1 1   watered A   7   SIMA    829 1   0.0056  NA  NA  NA  NA
PW1 2   watered C   2   SIMA    829 0   0.0484  6.5 150.7782165 1.956811087 0.159
PW1 3   watered D   2   SIMA    829 0   0.0181  3   158.1184404 1.94474398  0.1935
PW2 1   watered A   7   SIMA    829 0   0.0351  8.5 148.9020752 1.482003075 0.2405
PW2 2   watered C   2   SIMA    829 0   0.0508  1.5 170.3944295 1.653449107 0.127
PW2 3   watered D   2   SIMA    829 1   0.0009  NA  NA  NA  NA
PW3 1   watered A   7   SIMA    829 0   0.0073  1   159.8682043 1.594187964 0.224
PW3 2   watered C   2   SIMA    120 1   0.0217  NA  NA  NA  NA
PW3 3   watered D   2   SIMA    829 0   0.0919  25  146.6362786 1.694286556 0.1325
PW4 1   watered A   7   SIMA    120 1   0.0109  NA  NA  NA  NA
PW4 2   watered C   2   SIMA    829 1   0.0009  NA  NA  NA  NA
PW4 3   watered D   2   SIMA    152 1   0.0163  NA  NA  NA  NA
PW5 1   watered A   7   SIMA    829 1   0.0009  NA  NA  NA  NA
PW5 2   watered C   2   SIMA    535 1   0.0266  1.5 162.8057554 2.065105317 0.94
PW5 3   watered D   2   SIMA    829 0   0.058   4   80.37696758 1.831219479 0.1195
Justin Luong
  • 31
  • 1
  • 11

1 Answers1

4

The issue

The problem is actually with Thickness; it's easy to verify that

fit <- coxph(Surv(Time, Event) ~ Thickness, data = rsima)

produces the warning

Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge

We can get some insight into convergence issues from ?coxph:

In certain data cases the actual MLE estimate of a coefficient is infinity, e.g., a dichotomous variable where one of the groups has no events. When this happens the associated coefficient grows at a steady pace and a race condition will exist in the fitting routine: either the log likelihood converges, the information matrix becomes effectively singular, an argument to exp becomes too large for the computer hardware, or the maximum number of interactions is exceeded. (Nearly always the first occurs.) The routine attempts to detect when this has happened, not always successfully. The primary consequence for he user is that the Wald statistic = coefficient/se(coefficient) is not valid in this case and should be ignored; the likelihood ratio and score tests remain valid however.

The explanation

If we take a look at rsima$Thickness we notice that most values are small (in the range 0.08 <= Thickness <= 0.2495) with one single value being Thickness = 0.94. This is very similar to the case described in the documentation, where Thickness is basically a discrete variable (with levels "low" and "high") and one group having nearly no events (the "high" group has only one event).

Based on this post on Cross Validated, it's useful to visualise the effect by plotting

library(survminer)
ggsurvplot(survfit(Surv(Time, Event) ~ (Thickness > median(Thickness, na.rm = T)), data = df), data = df)

enter image description here

What we're doing here is plotting the survival probability as a function of a dichotomised Thickness, with Thickness being either smaller than its median value (the red curve), or larger (the blue curve).

You can see the effect of Thickness on the survival probability, or rather, the absence of an effect of Thickness. For example, notice how there are no Event = 1 cases for small Thickness values, and there is only one Event = 1 case for large Thickness values.

In terms of fitting the model, it is impossible to obtain a robust estimate of the Thickness effect on the survival probability, and Thickness should be removed from the model prior to exploring additional continuous/discrete covariates.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • 1
    nice catch. Also, if my computations are correct, after omitting cases with `NA` values in the predictors or response, there are only 25 observations left to estimate 5 parameters: `library(dplyr); rsimb <- (rsima %>% select(Time,Event,Treatment,SLA,VLA,Thickness,Growth_Rate) %>% na.omit() ); nrow(rsimb)` – Ben Bolker Sep 18 '18 at 22:17
  • 1
    Thanks @BenBolker. Wouldn't it even be 6 effect parameters, since `Treatment` is a `factor` with 3 levels, which means 4 continuous parameters plus 2 more from the `factor` relative to an offset? – Maurits Evers Sep 18 '18 at 22:32
  • Thanks, that .94 is actually an error/typo in the data which should actually be .094 which puts in in range with all other measurements, which didn't seem to affect the error. Further this error still occurs when I remove just thickness from the code and also occurs with `VLA` – Justin Luong Sep 18 '18 at 22:54
  • Hi also when I use just `SLA` I get this error: `In fitter(X, Y, strats, offset, init, control, weights = weights, : Loglik converged before variable 2 ; beta may be infinite. ` Sorry I haven't had much formal training in stats other than what I have self taught. When you say one group having no event, do you mean that there are not enough non surviving (event = 1) individuals (which had no measurements) to use them as covariates? The only time no error comes up is when I only use `Growth_Rate`, which is present for every individual – Justin Luong Sep 18 '18 at 22:56
  • @JustinLuong I don't get any warnings/errors when I do `coxph(Surv(Time, Event) ~ SLA, data = rsima)`. So I'm not sure how you get the error. To answer your other question: Take a look at the KM curves. You can see how `Thickness` really plays little role in affecting the survival probability. Almost all subjects survive, irrespective of the `Thickness` value. – Maurits Evers Sep 18 '18 at 23:37
  • Hi Mauritis, sorry I should of been more clear, I have been using all of these including `SLA` as covariates and the `Treatment` as the main effect. The error occrus when i run `coxph(Surv(Time, Event) ~ Treatment + SLA, data = rsima)` . I see what you mean about the KM curve, but I guess what I mean is there are no data on individuals that are dead or not survived for `Thickness` so it would show no difference as shown in the KM curve right? In other words I only have `Thickness` for individuals that are still "alive" (no individuals with event = 1 have `Thickness` values) – Justin Luong Sep 19 '18 at 00:47
  • Just to be clear that "beta may be infinite" message is NOT an "error" but rather a "warning". It often appears in small data situations where single combinations of factor covariates give really outlandish estiamtes. – IRTFM Oct 31 '18 at 22:02