0

I'm trying to find an efficient way to create a dataset that contains the original data, coefficient estimates, model fit, and fitted observations from a coxph() survival model. Currently, my code looks like this:

ex_model <- lung%>% #lung dataset from survival package
    nest(-sex)%>%
    mutate(fit = map(data, ~
                                        coxph(Surv(time, status) ~ age +
                                                        wt.loss +
                                                        meal.cal,
                                                    data = .)),
                glance = map(fit, glance),
                tidy = map(fit, tidy)))%>%
    glimpse()

# output

Columns: 5
# $ sex    <dbl> 1, 2
# $ data   <list> [<tbl_df[138 x 9]>], [<tbl_df[90 x 9]>]
# $ fit    <list> [0.0205745842, 0.0047088643, -0.0001776546, 1.962976e-04, -3.220915e-06, 1.011575e-06, -3.220915e-06, 5.999414e-05, -1.525540e-07, 1.011575e-06, -1.~
# $ glance <list> [<tbl_df[1 x 18]>], [<tbl_df[1 x 18]>]
# $ tidy   <list> [<tbl_df[3 x 5]>], [<tbl_df[3 x 5]>]

which gives me a dataframe with a column for the nest variable (sex), and four list columns data, fit, glance, and tidy. I would like to add a column augment containing the fittedd values for each observation but have been unsucessful mapping the augment function to fit.

Here is an example of code which generates my desired output using lm() instead of coxph()

ex_model <- lung%>% #lung dataset from survival package
    nest(-sex)%>%
    mutate(fit = map(data, ~
                                        lm(status ~ age +
                                                        wt.loss +
                                                        meal.cal,
                                                    data = .)),
                glance = map(fit, glance),
                tidy = map(fit, tidy),
                augment = map(fit, augment))%>%
    glimpse()

# output
# $ sex     <dbl> 1, 2
# $ data    <list> [<tbl_df[138 x 9]>], [<tbl_df[90 x 9]>]
# $ fit     <list> [1.415301e+00, 7.049341e-03, 6.981800e-04, -7.171368e-05, 0.18272053, 0.25767747, -0.90016296, 0.25907699, 0.15038564, 0.23754238, 0.22724620, 0.32~
# $ glance  <list> [<tbl_df[1 x 12]>], [<tbl_df[1 x 12]>]
# $ tidy    <list> [<tbl_df[4 x 5]>], [<tbl_df[4 x 5]>]
# $ augment <list> [<tbl_df[106 x 11]>], [<tbl_df[65 x 11]>]

When I use the mutate(augment = map(fit, augment))%>% syntax with coxph(), RStudio returns an error:

Did you want `data = c(inst, time, status, age, ph.ecog, ph.karno, pat.karno, meal.cal, 
    wt.loss)`?Error: Problem with `mutate()` input `augment`.
x Must specify either `data` or `newdata` argument.
i Input `augment` is `map(fit, augment)`.

Is this a problem with my syntax, or is there a more fundamental reason I can't augment(fit) here? What is the most efficient way around this issue?

Rob Lytle
  • 5
  • 2
  • Is the data(lung) from survminer as I couldn't load that data from the package – akrun Aug 13 '21 at 19:02
  • @akrun, my mistake, the lung dataset is in the survival package – Rob Lytle Aug 13 '21 at 19:11
  • I tried that too earlier :) > library(survival) > data(lung) Warning message: In data(lung) : data set ‘lung’ not found` – akrun Aug 13 '21 at 19:12
  • @akrun, not sure why data(lung) doesn't work, I just called the dataset with `lung` without loading it into the environment. If survival is loaded you should be able to just call the piped code in the original question without first assigning the dataset to an object (or just call `lung_df <- lung`, etc). – Rob Lytle Aug 13 '21 at 20:06
  • maybe it is package loading issue. Not clear though – akrun Aug 13 '21 at 20:06
  • ,@akrun: `lung data(cancer, package="survival")` – TarJae Aug 13 '21 at 20:14
  • @TarJae I'm guessing that the intent was to contribute: `lung <- data(cancer, package="survival")` – IRTFM Aug 15 '21 at 01:22

1 Answers1

0

The error you want to focus on is

x Must specify either `data` or `newdata` argument.

This error comes from augment() and it says that to use augment you have to pass in data in addition to the model. So you need to do augment(model_fit, newdata = my_new_data). For more information on augment.coxph() look here.

library(tidyverse)
library(survival)
library(broom)

ex_models <- lung %>%
    nest(data = c(inst, time, status, age, ph.ecog, ph.karno, pat.karno, meal.cal, wt.loss)) %>%
    mutate(fit = map(data, ~ coxph(Surv(time, status) ~ age + wt.loss + meal.cal, data = .)),
           glance = map(fit, glance),
           tidy = map(fit, tidy),
           augment = map(fit, augment, newdata = lung))

glimpse(ex_models)
#> Rows: 2
#> Columns: 6
#> $ sex     <dbl> 1, 2
#> $ data    <list> [<tbl_df[138 x 9]>], [<tbl_df[90 x 9]>]
#> $ fit     <list> [0.0205745842, 0.0047088643, -0.0001776546, 1.962976e-04, -3.2…
#> $ glance  <list> [<tbl_df[1 x 18]>], [<tbl_df[1 x 18]>]
#> $ tidy    <list> [<tbl_df[3 x 5]>], [<tbl_df[3 x 5]>]
#> $ augment <list> [<tbl_df[228 x 12]>], [<tbl_df[228 x 12]>]

Created on 2021-08-14 by the reprex package (v2.0.1)

EmilHvitfeldt
  • 2,555
  • 1
  • 9
  • 12
  • Thank you very much, this solved the problem. I was calling `augment()` using the formula syntax inside map. Did not realize that additional arguments could be called using the syntax you provided. Do you have any resources you'd recommend for better understanding `map()` beyond the package documentation? – Rob Lytle Aug 14 '21 at 22:07