0

I am running an accelerated failure time (AFT) Weibull model in R using the survival package. It includes a factor variable AND has cluster robust standard errors.

I now need to estimate marginal effects. In other words, I need to estimate the change in survival time for a one standard deviation increase (or one unit increase when the covariate is a dummy variable) for each of my covariates. I tried using the margins package, but it's not tested for the Surv objects and I received a bunch of error messages.

So I resorted to trying to calculate them using the predict function. However, I have so far been unsuccessful. This is what I have devised so far:

  library(survival)
  library(dplyr)

  #Model
  ourdata$NOTCENSORED=1
  Execweibullmodel <-survreg(Surv(TOTALDAYSTORECEIVED, 
  NOTCENSORED)~DifferenceinPartyMeansfirstby10s + PartyImbalance2by10s + 
  LengthOfPlanningin10s +  FirstHundred_intent + WHWorkload_totalin10s + 
  AvgWkIntents + priorityappoint + IRC_Dummy + factor(Policy_num) +  
  RepublicanPresident + MonthlyPresApprovalAvg_Received + female 
  +cluster(presidentnumber), data=ourdata, dist="weibull", robust=TRUE)

  #Create mock dataset holding all continuous covariates constant at their 
  mean. Below I am changing LengthOfPlanningin10s, because I want to 
  calculate the change in survival time when I increase it from 70 to 248.

  ND <- with(ourdata, expand.grid(
  DifferenceinPartyMeansfirstby10s = 6.9695041,
  PartyImbalance2by10s=1.0748024,
  LengthOfPlanningin10s=c(70, 248),
  FirstHundred_intent=0.2840300,
  WHWorkload_totalin10s=74.7713290, 
  AvgWkIntents= 12.32512,
  priorityappoint=4, 
  IRC_Dummy=c(.176),
  Policy_num=levels(ourdata$Policy_num),
  RepublicanPresident=c(.638),
  MonthlyPresApprovalAvg_Received=c(56.2),
  female=c(.256),
  presidentnumber=levels(ourdata$presidentnumber)
  ))

 #Calculate predicted values

 prs <- predict(Execweibullmodel, ND, se.fit = TRUE, type = "lp")
 ND$pred <- prs[[1]]
 ND$se <- prs[[2]]
 ND$lo <- ND$pred - 1.96 * ND$se
 ND$up <- ND$pred + 1.96 * ND$se
 head(ND, 10)

 #Calculate the change in the predicted value from one level to the next,and 
 average across the factor variables
 newobject3=ND %>%
    group_by(Policy_num, presidentnumber ) %>% 
    dplyr::mutate(dydx2 = c(NA, diff(pred))) 

 #Calculate the average change
 mef1=mean(newobject3$dydx2, na.rm=TRUE)

This does give me a value, but it is very different from the marginal effect produced by Stata. Stata uses a slightly different method for calculating cluster robust standard errors, so the coefficient estimates are slightly off by at most a decimal place. However, the marginal effects that Stata give me are magnitudes larger than what I get using this method. So I think I am doing something wrong.

Some data (my dataset is large so these first 20 obs might not be enough):

ourdata <-
structure(list(TOTALDAYSTORECEIVED = c(538, 561, 272, 384, 308, 
371, 184, 561, 167, 552, 254, 247, 37, 149, 55, 293, 440, 62, 
288, 461), DifferenceinPartyMeansfirstby10s = c(5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954), PartyImbalance2by10s = c(1.81818187236786, 
1.81818187236786, 1.41414141654968, 1.41414141654968, 1.41414141654968, 
1.41414141654968, 1.41414141654968, 1.81818187236786, 1.41414141654968, 
1.81818187236786, 1.41414141654968, 1.41414141654968, 1.41414141654968, 
1.41414141654968, 1.41414141654968, 1.41414141654968, 1.63265299797058, 
1.41414141654968, 1.41414141654968, 1.63265299797058), LengthOfPlanningin10s = c(35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789), FirstHundred_intent = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0), WHWorkload_totalin10s = c(30.6000003814697, 
28.6000003814697, 51.4000015258789, 43.5999984741211, 48, 45.4000015258789, 
59.2000007629395, 28.6000003814697, 63, 29.3999996185303, 52.4000015258789, 
53.2999992370605, 83.9000015258789, 65.5, 82.8000030517578, 48.9000015258789, 
38.4000015258789, 80.1999969482422, 49.5, 37.2999992370605), 
    AvgWkIntents = c(5, 11, 16, 16, 17, 17, 12, 11, 11, 3, 4, 
    17, 9, NA, NA, 10, 16, 9, 13, 4), priorityappoint = c(5L, 
    6L, 2L, 2L, 5L, 5L, 5L, 6L, 5L, 3L, 5L, 5L, 5L, 6L, 5L, 5L, 
    5L, 5L, 5L, 5L), IRC_Dummy = c(0, 1, 1, 1, 1, 1, 0, 1, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Policy_num = structure(c(1L, 
    8L, 8L, 8L, 8L, 8L, 9L, 8L, 10L, 8L, 9L, 3L, 6L, 8L, 10L, 
    5L, 1L, 4L, 8L, 8L), .Label = c("SocialWelfare", "Agriculture", 
    "Commerce", "Infrastructure", "Justice", "Labor", "Treasury", 
    "Other", "Defense", "ForeignPolicy"), class = "factor"), 
    RepublicanPresident = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), MonthlyPresApprovalAvg_Received = c(42, 
    41.3333320617676, 56, 47, 52, 48, 58, 41.3333320617676, 58, 
    42, 56, 52, 53, 58.6666679382324, 60, 52, 44, 60, 52, 44), 
    female = c(0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 
    1, 0, 1, 0), presidentnumber = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("Reagan", "GHW Bush", "Clinton", "GW Bush", 
    "Obama", "Trump"), class = "factor")), .Names = c("TOTALDAYSTORECEIVED", 
"DifferenceinPartyMeansfirstby10s", "PartyImbalance2by10s", "LengthOfPlanningin10s", 
"FirstHundred_intent", "WHWorkload_totalin10s", "AvgWkIntents", 
"priorityappoint", "IRC_Dummy", "Policy_num", "RepublicanPresident", 
"MonthlyPresApprovalAvg_Received", "female", "presidentnumber"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20"
), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110
Heather Ba
  • 33
  • 6
  • So if I want to put my data up, using dput(), can you give me more instructions on how to do this? – Heather Ba Jul 16 '18 at 20:41
  • 1
    I pasted the output into the question. I do have a large amount of data, so hopefully that provides enough. Doing some googling for similar simulation techniques for the Weibull. – Heather Ba Jul 16 '18 at 21:04
  • Perfect, great! Thank you. I will give it a try with your data this evening if I have time. – Hack-R Jul 16 '18 at 22:51
  • Thanks! I've got a few knowledgeable friends who I've consulted with and they all say they use a different program for survival analysis for reasons like this. So hopefully someone on stack overflow will have some insight. – Heather Ba Jul 16 '18 at 23:08
  • Also, take note that I update the text in my question. I am able to use this code, and modify the values of the covariates in the mock data set used for predict to get different predicted values. However, the final number I arrive at (the average of the change in the predicted values), is much much smaller than the marginal effects I get out of Stata after I run a comparable model (the coefficient estimates Stata produces for the Weibull model are very similar so there shouldn't be a huge discrepancy in the marginal effect). This makes me think that I am not doing something right. – Heather Ba Jul 16 '18 at 23:37
  • If you don't get the answer you need here I can delete my answer and you can try re-asking it on Cross Validated – Hack-R Jul 17 '18 at 11:12
  • Thanks. I appreciate it. I did some more crowdsourcing of info and advice among knowledgeable colleagues and the answer is the same -- survival analysis packages in R are underdeveloped. Someone gave me some good tips on getting what I need out of Stata, so I am going to pursue that for the moment. – Heather Ba Jul 17 '18 at 14:35
  • Yw. I don't think that's accurate at all tho; more likely *their knowledge of it* is under-developed. Stata is a very limited little tool the came into and started falling out of popularity before and leading up to the rise of Python and R in statistics. I would say they are strictly better in from a perspective of the available statistics and would recommend strongly against wasting your time learning the starkly limited features in Stata. Brain drain has decimated its community over the past decade. Not having time to learn R libraries is not the same as it being underdeveloped. – Hack-R Jul 17 '18 at 14:40

1 Answers1

1

While you're collecting your example data, I'll provide the following simulation-based approach I adapted from this source:

# Load data
data("CarpenterFdaData")

# Run a model  
model <- coxph(Surv(acttime, censor) ~ lethal*prevgenx, data = CarpenterFdaData)

# Simulate the Marginal Effect of interest:
marginal_effects <- coxsimInteract(model, 
                       b1 = "lethal", 
                       b2 = "prevgenx",
                       qi = "Marginal Effect",
                       X2 = seq(2, 115, by = 2), nsim = 1000)

enter image description here

Hack-R
  • 22,422
  • 14
  • 75
  • 131