I am running an accelerated failure time (AFT) Weibull model in R using the survival package. It includes a factor variable AND has cluster robust standard errors.
I now need to estimate marginal effects. In other words, I need to estimate the change in survival time for a one standard deviation increase (or one unit increase when the covariate is a dummy variable) for each of my covariates. I tried using the margins package, but it's not tested for the Surv
objects and I received a bunch of error messages.
So I resorted to trying to calculate them using the predict function. However, I have so far been unsuccessful. This is what I have devised so far:
library(survival)
library(dplyr)
#Model
ourdata$NOTCENSORED=1
Execweibullmodel <-survreg(Surv(TOTALDAYSTORECEIVED,
NOTCENSORED)~DifferenceinPartyMeansfirstby10s + PartyImbalance2by10s +
LengthOfPlanningin10s + FirstHundred_intent + WHWorkload_totalin10s +
AvgWkIntents + priorityappoint + IRC_Dummy + factor(Policy_num) +
RepublicanPresident + MonthlyPresApprovalAvg_Received + female
+cluster(presidentnumber), data=ourdata, dist="weibull", robust=TRUE)
#Create mock dataset holding all continuous covariates constant at their
mean. Below I am changing LengthOfPlanningin10s, because I want to
calculate the change in survival time when I increase it from 70 to 248.
ND <- with(ourdata, expand.grid(
DifferenceinPartyMeansfirstby10s = 6.9695041,
PartyImbalance2by10s=1.0748024,
LengthOfPlanningin10s=c(70, 248),
FirstHundred_intent=0.2840300,
WHWorkload_totalin10s=74.7713290,
AvgWkIntents= 12.32512,
priorityappoint=4,
IRC_Dummy=c(.176),
Policy_num=levels(ourdata$Policy_num),
RepublicanPresident=c(.638),
MonthlyPresApprovalAvg_Received=c(56.2),
female=c(.256),
presidentnumber=levels(ourdata$presidentnumber)
))
#Calculate predicted values
prs <- predict(Execweibullmodel, ND, se.fit = TRUE, type = "lp")
ND$pred <- prs[[1]]
ND$se <- prs[[2]]
ND$lo <- ND$pred - 1.96 * ND$se
ND$up <- ND$pred + 1.96 * ND$se
head(ND, 10)
#Calculate the change in the predicted value from one level to the next,and
average across the factor variables
newobject3=ND %>%
group_by(Policy_num, presidentnumber ) %>%
dplyr::mutate(dydx2 = c(NA, diff(pred)))
#Calculate the average change
mef1=mean(newobject3$dydx2, na.rm=TRUE)
This does give me a value, but it is very different from the marginal effect produced by Stata. Stata uses a slightly different method for calculating cluster robust standard errors, so the coefficient estimates are slightly off by at most a decimal place. However, the marginal effects that Stata give me are magnitudes larger than what I get using this method. So I think I am doing something wrong.
Some data (my dataset is large so these first 20 obs might not be enough):
ourdata <-
structure(list(TOTALDAYSTORECEIVED = c(538, 561, 272, 384, 308,
371, 184, 561, 167, 552, 254, 247, 37, 149, 55, 293, 440, 62,
288, 461), DifferenceinPartyMeansfirstby10s = c(5.55999994277954,
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954,
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954,
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954,
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954,
5.55999994277954, 5.55999994277954, 5.55999994277954), PartyImbalance2by10s = c(1.81818187236786,
1.81818187236786, 1.41414141654968, 1.41414141654968, 1.41414141654968,
1.41414141654968, 1.41414141654968, 1.81818187236786, 1.41414141654968,
1.81818187236786, 1.41414141654968, 1.41414141654968, 1.41414141654968,
1.41414141654968, 1.41414141654968, 1.41414141654968, 1.63265299797058,
1.41414141654968, 1.41414141654968, 1.63265299797058), LengthOfPlanningin10s = c(35.4000015258789,
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789,
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789,
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789,
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789,
35.4000015258789, 35.4000015258789, 35.4000015258789), FirstHundred_intent = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0), WHWorkload_totalin10s = c(30.6000003814697,
28.6000003814697, 51.4000015258789, 43.5999984741211, 48, 45.4000015258789,
59.2000007629395, 28.6000003814697, 63, 29.3999996185303, 52.4000015258789,
53.2999992370605, 83.9000015258789, 65.5, 82.8000030517578, 48.9000015258789,
38.4000015258789, 80.1999969482422, 49.5, 37.2999992370605),
AvgWkIntents = c(5, 11, 16, 16, 17, 17, 12, 11, 11, 3, 4,
17, 9, NA, NA, 10, 16, 9, 13, 4), priorityappoint = c(5L,
6L, 2L, 2L, 5L, 5L, 5L, 6L, 5L, 3L, 5L, 5L, 5L, 6L, 5L, 5L,
5L, 5L, 5L, 5L), IRC_Dummy = c(0, 1, 1, 1, 1, 1, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Policy_num = structure(c(1L,
8L, 8L, 8L, 8L, 8L, 9L, 8L, 10L, 8L, 9L, 3L, 6L, 8L, 10L,
5L, 1L, 4L, 8L, 8L), .Label = c("SocialWelfare", "Agriculture",
"Commerce", "Infrastructure", "Justice", "Labor", "Treasury",
"Other", "Defense", "ForeignPolicy"), class = "factor"),
RepublicanPresident = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), MonthlyPresApprovalAvg_Received = c(42,
41.3333320617676, 56, 47, 52, 48, 58, 41.3333320617676, 58,
42, 56, 52, 53, 58.6666679382324, 60, 52, 44, 60, 52, 44),
female = c(0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
1, 0, 1, 0), presidentnumber = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("Reagan", "GHW Bush", "Clinton", "GW Bush",
"Obama", "Trump"), class = "factor")), .Names = c("TOTALDAYSTORECEIVED",
"DifferenceinPartyMeansfirstby10s", "PartyImbalance2by10s", "LengthOfPlanningin10s",
"FirstHundred_intent", "WHWorkload_totalin10s", "AvgWkIntents",
"priorityappoint", "IRC_Dummy", "Policy_num", "RepublicanPresident",
"MonthlyPresApprovalAvg_Received", "female", "presidentnumber"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20"
), class = "data.frame")