1

How do I combine two dataframes that include survival::Surv objects such that the fields in the resulting dataframe have the same class as the source data frames?

I have found that using rbind results in the Surv objects being converted to matrices. For example, I created df1 as follows:

library(survival)
df1 <- data.frame(obs = c('A','B','C','D','E')
                  , lo = c(10,20,30,40,50)
                  , hi = c(30,30,30,40,50))
df1$conc <-survival::Surv(df1$lo, df1$hi, type = "interval2")

Next, I check df1's contents and structure as well as df1$conc class. Note that in the str command, that conc is Surv

> df1
  obs lo hi     conc
1   A 10 30 [10, 30]
2   B 20 30 [20, 30]
3   C 30 30       30
4   D 40 40       40
5   E 50 50       50

> str(df1)    
'data.frame':   5 obs. of  4 variables:
 $ obs : chr  "A" "B" "C" "D" ...
 $ lo  : num  10 20 30 40 50
 $ hi  : num  30 30 30 40 50
 $ conc: 'Surv' num [1:5, 1:3] [10, 30] [20, 30] 30       40       ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:3] "time1" "time2" "status"
  ..- attr(*, "type")= chr "interval"

> class(df1$conc)
[1] "Surv"

Next, create df2 as a copy of df1, rbind df1 and df2 together as df3.

df2 <- df1
df3 <- rbind(df1,df2)

The structure of df3 looks almost the same as df1 above but the field conc is numeric and the type attribute is missing.

>str(df3)
'data.frame':   10 obs. of  4 variables:
 $ obs : chr  "A" "B" "C" "D" ...
 $ lo  : num  10 20 30 40 50 10 20 30 40 50
 $ hi  : num  30 30 30 40 50 30 30 30 40 50
 $ conc: num [1:10, 1:3] 10 20 30 40 50 10 20 30 40 50 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:3] "time1" "time2" "status"

Also note the class of the df3$conc is not a Surv object

>class(df3$conc)
[1] "matrix" "array" 

The content of df3 looks a little strange, but makes sense given how the survival package stores its data.

> df3
   obs lo hi conc.time1 conc.time2 conc.status
1    A 10 30         10         30           3
2    B 20 30         20         30           3
3    C 30 30         30          1           1
4    D 40 40         40          1           1
5    E 50 50         50          1           1
6    A 10 30         10         30           3
7    B 20 30         20         30           3
8    C 30 30         30          1           1
9    D 40 40         40          1           1
10   E 50 50         50          1           1
greengrass62
  • 968
  • 7
  • 19

1 Answers1

1

We can use bind_rows

library(dplyr)
df3 <- bind_rows(df1, df2)

df3
#   obs lo hi     conc
#1    A 10 30 [10, 30]
#2    B 20 30 [20, 30]
#3    C 30 30       30
#4    D 40 40       40
#5    E 50 50       50
#6    A 10 30 [10, 30]
#7    B 20 30 [20, 30]
#8    C 30 30       30
#9    D 40 40       40
#10   E 50 50       50

If we need to use rbind, subset the normal columns (conc is a matrix) and then assign the concatenated 'conc'

nm1 <- setdiff(names(df1), 'conc')
df3 <- rbind(df1[nm1], df2[nm1])
df3$conc <- c(df1$conc, df2$conc)
df3
#   obs lo hi     conc
#1    A 10 30 [10, 30]
#2    B 20 30 [20, 30]
#3    C 30 30       30
#4    D 40 40       40
#5    E 50 50       50
#6    A 10 30 [10, 30]
#7    B 20 30 [20, 30]
#8    C 30 30       30
#9    D 40 40       40
#10   E 50 50       50
akrun
  • 874,273
  • 37
  • 540
  • 662