2

I'm performing survival analysis, and I want to create a Surv object as its own column in a data.table. Although Surv objects are considered vectors, I can't use them to make new column since they are actually a 2 column matrix. Is there an elegant way to include Surv objects without splitting them into separate columns?

This is what a Surv object looks like.

DT[,Surv(time, status)]
#>  [1]   9   13   13+  18   23   28+  31   34   45+  48  161+   5    5    8 
#> [15]   8   12   16+  23   27   30   33   43   45

Here is an example of what I want to do:

library(data.table)
library(survival)

DF <- as.data.frame(survival::aml)
DT <- as.data.table(survival::aml)

# Does work
DF$survival <- Surv(DF$time, DF$status)

# Does not work
DT[,survival:=Surv(time, status)]
Dewey Brooke
  • 407
  • 4
  • 10
  • there's an outstanding issue to do exactly this on the GitHub page. I spent a few minutes trying to fix the issue but couldn't track down whats going on. – MichaelChirico Aug 24 '19 at 01:06
  • Sounds like an X-Y problem, i.e., "I want to do X by way of Y (which is not allowed by syntax constraints)". So why not tell us what you want to do rather than telling us how to do it? – IRTFM Aug 24 '19 at 02:07
  • @42- Tell me if this makes things more clear. Why can I store a `Surv` object as a vector in a data.frame but I cannot do the same in a data.table? Is this impossible? If not, is there a syntax nuance I'm missing that makes it work? – Dewey Brooke Aug 24 '19 at 02:55
  • Dataframes accept matrices. Datatables do not. – IRTFM Aug 24 '19 at 04:31
  • @42- Can you please be more helpful? I'm split between bench work and bioinformatics, and I'm too tired to figure out how I was supposed to ask that question. – Dewey Brooke Aug 24 '19 at 06:19
  • Describe the problem. It’s not clear that you actually need to create a Surv object inside a data.tabble. – IRTFM Aug 24 '19 at 06:29
  • This seems to be an issue with the `data.table` syntax, as stated by the other commenters. But as you can create a `data.frame` from the object, a very quick-and-dirty solution would be `DT <- as.data.frame(survival::aml); setDT(DT)`. – Oliver Aug 24 '19 at 06:38
  • @42- Thanks! I'm trying to validate genes we think are prognostic in ovarian cancer using datasets from the Gene Expression Omnibus (GEO) (I'm aware of `curatedOvarianData` from Bioconductor). I have collected dozens of datasets and created a suite of functions for analyzing them over the past several months. I wanted to include the `Surv` object in the data.table to consolidate the size of the datasets and skip an analysis step. Although I could find a different solution, I was surprised that I couldn't and figured this would be a good time to understand why it didn't work. – Dewey Brooke Aug 24 '19 at 20:07

1 Answers1

2

It's not yet clear what the underlying plan is for such a construction, but if the hope is to do survival modeling inside the data.table environment then separate construction of a Surv-object is not necessary. One should get comfortable with putting in complete expressions in the data.table j-position:

> DT[ , coxph( Surv(time, status) ~ 1, data=.SD) ]
Call:  coxph(formula = Surv(time, status) ~ 1, data = .SD)

Null model
  log likelihood= -42.72484 
  n= 23 

The data.table function creates an environment where column names get evaluated without quotes:

> DT[ , summary(coxph( Surv(time, status) ~ x), data=.SD) ]
Call:
coxph(formula = Surv(time, status) ~ x)

  n= 23, number of events= 18 

                 coef exp(coef) se(coef)     z Pr(>|z|)  
xNonmaintained 0.9155    2.4981   0.5119 1.788   0.0737 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

               exp(coef) exp(-coef) lower .95 upper .95
xNonmaintained     2.498     0.4003    0.9159     6.813

Concordance= 0.619  (se = 0.063 )
Likelihood ratio test= 3.38  on 1 df,   p=0.07
Wald test            = 3.2  on 1 df,   p=0.07
Score (logrank) test = 3.42  on 1 df,   p=0.06

In fact the practice of separate construction of Surv-objects outside of the coxph function is something that brings questions to the rhelp mailing list because such outside makes an object whose environment is not the dataframe offered to coxph but is rather the globalenv(). Terry Therneau, the author of the survival package, warns people NOT to make separate Surv-objects. This is entirely separate from any issues regarding encapsulation of matrices in data.table, but hopefully it will reduce the level of frustration with this barrier.

IRTFM
  • 258,963
  • 21
  • 364
  • 487