3

I'm doing a survival analysis about the time some individual components remain in the source code of a software project, but some of these components are being dropped by the survfit function.

This is what I'm doing:

library(survival)
data <- read.table(text = "component_id weeks removed
1              1       1
2              1       1
3              1       1
4              1       1
5              1       1
6              1       1
7              1       1
8              2       0
9              2       0
10              2       0
11              2       0
12              2       1
13              2       1
14              2       0
15              2       0
16              2       0
17              2       0
18              2       0
19              2       0
20              2       1
21              2       1
22              2       0
23              2       0
24              3       1
25              3       1
26              3       1
27              3       1
28              7       1
29              7       1
30             14       1
31             14       1
32             14       1
33             14       1
34             14       1
35             14       1
36             14       1
37             14       1
38             14       1
39             14       1
40             14       1
41             14       1
42             14       1
43             14       1
44             14       1
45             14       1
46             14       1
47             14       1
48             40       1
49             40       1
50             40       1
51             40       1
52             48       1
53             48       1
54             48       1
55             48       1
56             48       1
57             48       1
58             48       1
59             48       1
60             56       1
61             56       1
62             56       1
63             56       1
64             56       1
65             56       1
66             56       1
67             56       1
68             56       1
69             56       1", header = TRUE)

fit <- survfit(Surv(data$weeks, data$removed) ~ 1)
summary(fit, censored=TRUE)

And this is the output

Call: survfit(formula = Surv(data$weeks, data$removed) ~ 1)

time n.risk n.event survival std.err lower 95% CI upper 95% CI
   1     69       7    0.899  0.0363        0.830        0.973
   2     62       4    0.841  0.0441        0.758        0.932
   3     46       4    0.767  0.0533        0.670        0.879
   7     42       2    0.731  0.0567        0.628        0.851
  14     40      18    0.402  0.0654        0.292        0.553
  40     22       4    0.329  0.0629        0.226        0.478
  48     18       8    0.183  0.0520        0.105        0.319
  56     10      10    0.000     NaN           NA           NA

I was expecting the number of events to be 69 but I get 12 subjects dropped.

I initially thought I was misusing the package functions, and carried a type="interval2" approach, following a similar situation, but the drops keep happening with now a weird continuous number of subjects and events counts:

as.t2 <- function(i, data) if (data$removed[i] == 1) data$weeks[i] else NA
size  <- length(data$weeks)
t1    <- data$weeks
t2    <- sapply(1:size, as.t2, data = data)
interval_fit <- survfit(Surv(t1, t2, type="interval2") ~ 1)
summary(interval_fit, censored=TRUE)

Next, I found what I call a mid-air explanation, clarifying a bit further the situation. I understand this is caused by non-censored subjects appearing after a "constant censoring time", but again, why?

That led me somehow to dig deeper and read about right-truncation and realized that type of studies mapped very closely to the drops I'm experiencing. Here's Klein & Moeschberger:

Truncation of survival data occurs when only those individuals whose event time lies within a certain observational window $(Y_L,Y_R)$ are observed. An individual whose event time is not in this interval is not observed and no information on this subject is available to the investigator.

Right truncation occurs when $Y_L$ is equal to zero. That is, we observe the survival time $X$ only when $X \leq Y_R$.

From my perspective, these drops carry important information for my study regardless of their time of entry.

How can I stop the drops?

elhoyos
  • 849
  • 9
  • 19
  • I have only had grief with R's survival package. It is far easier to solve the problem using the advice on the survival analysis page of Wikipedia and R's $optim$ funtion using the full negative log likelihood described in Wikipedia. Will need a hazard function, and there are several to choose from, also described in Wikipedia. – Peter Leopold Apr 10 '19 at 16:30
  • Any chance we could be overlooking any option in the `survival` package? – elhoyos Apr 11 '19 at 15:14
  • If there is, I never found it. I don't understand the survival package data management paradigm. I suggest you find a worked example doing exactly what you want to do, then hack the example to suit. (That's a stack overflow answer, which is not really what we like to pride ourselves in here at Cross Validated. Sigh, I suppose I should give you an official answer -- a full method complete with R implementation -- to supplant the survival package. That's the Cross Validated way. But not today. Sigh.) – Peter Leopold Apr 11 '19 at 18:48
  • And to clarify, by "right-truncated" you mean "right-censored" in the standard survival parlance? These are the surivivors -- the ones who did not "transition" -- at the end of the study, right? – Peter Leopold Apr 11 '19 at 19:02
  • I'm referring to the truncation definition as defined by Klein & Moeschberger in Techniques for Censored and Truncated Data, quoted above. I myself do not understand these two concepts well, but according to authors, censoring refers to observations that are known to be within an interval, whereas in truncation these subjects are not even accounted for when fall outside an interval. Studies can feature both conditions. In my case, I do have "right-censored" observations that are "right-truncated". I still do not understand why some experiments do not truncate these censored observations. – elhoyos Apr 12 '19 at 05:28
  • 2
    I don't mean to nitpick, but I want to get clearer about terminology. I think that will help solve your problem. It looks like you have right-censoring, not truncation, as you do not appear to have a single upper limit of follow-up time for all observations. Using Klein & Moeschberger's terminology, you don't have a $Y_R$ value that is fixed. I say this because you have quite a few rows among IDs 8-23 that have a follow-up time of 2 weeks that do not have `removed` event recorded---and you have many observations for more than 2 weeks. Thus we have right-censoring, not truncation. – Gregor Thomas Apr 15 '19 at 15:25
  • 4
    I'm also surprised by your statement *"I was expecting the number of events to be 69 but I get 12 subjects dropped.*". I think we need to clarify *subjects* vs *events*. Your data has 69 rows, each with a unique ID, so you have 69 subjects. In the `removed` column which you use to mark events, there are 57 1s. So among your 69 subjects, you observe 57 events. 12 subjects do not have events in your data. This is a simple description of your data, nothing to do with the `survival` package or the `survfit` function. – Gregor Thomas Apr 15 '19 at 15:29
  • 3
    All 69 subjects are included in the `survfit` results, you can see them in the `n.risk` column. Those numbers match your data by week. No subjects are dropped until their weeks of observation ends. If you want, I can try to write this up as an answer... – Gregor Thomas Apr 15 '19 at 15:31
  • I think part of the confusion here is the definition of the `removed` column. Does this indicate removal due to loss to follow-up or the occurrence of the event of interest? In the current dataset, there is not a distinction between a censoring event and your event of interest. – Raoul Duke Apr 16 '19 at 19:32
  • @elhoyos Looks like right-censored data to me. Nothing is dropped. Can you update and clarify your question? – adibender Apr 27 '19 at 15:15

0 Answers0