1

Code I used for data entry:

library(tidyverse)
library(survminer)
library(flexsurv)
library(survival)
library(finalfit)

data = read_delim("data.csv", 
    ",", escape_double = FALSE, trim_ws = TRUE)
data = data %>%
      mutate_if(is.character, as.factor)

Simple model plots the predictions of the two study groups on KM curves:

model1 = flexsurvreg(Surv(time, status) ~ group, dist="weibull", data = data)
plot(model1)

However, when I add other variables into the model, the plot does not draw separate lines for group A and B.

model2 = flexsurvreg(Surv(time, status) ~ group + age + sex, dist="weibull", data = data)
plot(model2)

Thus, I used the "newdata" argument to plot separate curves for group A and B:

newdata = data.frame(group=c("A", "B"), age = 50, sex = c("f","m"))

And then plotted model predictions on KM curves:

KM = survfit(Surv(time, status) ~ group, data=data)
plot(KM, col="black")
lines(model2, newdata = mf, col = "red")

LINK TO PLOT

Lines do not converge much and I think I figured out what's the problem. The "newdata" argument combines group A subjects with females only and vice versa. Please check the summary table:

LINK TO TABLE

Code I used for the summary table:

summary(model2, newdata=newdata, ci = TRUE, tidy = T)

How to solve the problem or get predictions for group A (including females and males) and group B (including females and males)?

Here is the code and data: R project folder

st4co4
  • 445
  • 3
  • 10
  • You should show the code used to do the data entry. – IRTFM Apr 24 '20 at 17:22
  • I'm wondering if you read the help page for `plot.flexsurvreg`: "To plot Kaplan-Meier and fitted curves for only a subset of groups, use` plot(survfit())` followed by `lines.flexsurvreg().` If there are any continuous covariates, then a single population Kaplan-Meier curve is drawn. By default, a single fitted curve is drawn with the covariates set to their mean values in the data - for categorical covariates, the means of the 0/1 indicator variables are taken." – IRTFM Apr 24 '20 at 20:35
  • Thank you for the comments! I think I finally figured out the code for plotting the predictions against KM curves. I added the result to the end of my initial post. The lines do not converge much. What to do? Still check best AIC and log-likelihood and publish the results? – st4co4 Apr 27 '20 at 14:23
  • This is a coding Q/A site. You seem to have solved the initial question and now extended it. If you want to get code-focused answers you should post more code. – IRTFM Apr 27 '20 at 19:49
  • Thank you! This seems to be a coding error that caused the problem (defining the "newdata" argument). I edited the initial post and added as much code as possible. Is there a solution for the final question? – st4co4 Apr 28 '20 at 13:19
  • When I read the help page for `summary.flexsurvreg` it says 'newdata' needs to have every column named in the formula. Perhaps you also need 'time' and 'status'? I'm guessing any entry for `status` might be sufficient, but I would probably use 0. – IRTFM Apr 28 '20 at 19:25
  • I read all possible help pages for this. Every column is named in "newdata". Status and time have different purposes. The "newdata" argument should be defined similarly as it is done for "predict.lm". Despite efforts, I haven't figured out how to do it. Why is that so complicated? It's a basic prediction that is needed for two groups. Studies are not only interested in combinations of different factor levels. – st4co4 Apr 30 '20 at 07:42
  • I made a separate and simple R project folder for this question. This is given at the end of the initial post. – st4co4 Apr 30 '20 at 08:44

0 Answers0