0

I ran the following Cox model and got 1526679 deleted observations, which is a large portion of my data.

Call: coxph(formula = Surv(time1sec, time2sec, event) ~ gain + 
Buy + Lev + TP + frailty(ID), data)

n= 73322, number of events= 73322 (1526679 observations deleted due to missingness)

I am not sure why these observations were deleted. I am certain that these values are there, and are not empty. This started happening when I added the ID as a frailty term.

Any ideas what might be going on here?

zx8754
  • 52,746
  • 12
  • 114
  • 209
finstats
  • 1,349
  • 4
  • 19
  • 31

2 Answers2

0

In standard regression (and regression-esque as in the case of cox regression) types of problems, the default method to handling missing values is simply to ignore them. This occurs for both the coefficients and the values you're trying to predict.

I would start out by explicitly verifying that everything you want is there. This can be done via the following code:

apply(data, 2, function(x) length(which(is.na(x))))

I would also verify that my Surv object does not have any NA's, with the following code:

length(which(is.na(Surv(time1sec, time2sec, event))))
kblansit
  • 29
  • 2
  • op has done this with `dim(data[complete.cases(data),])` in the comments suggesting that there is no missing data at all – rawr Apr 13 '15 at 19:22
  • Got 0 NAs. This issue started to happen when I added the frailty(ID) variable. The model was working fine before adding that term. – finstats Apr 13 '15 at 19:22
0

Also, make sure that your data source only contains relevant information. For instance, I imported my data file into R which included many entries which were not useful for the analysis (e.g. legends, keywords) and R obviously excludes those results, and they show as "missingness."