I am trying to use the MOB procedure from the R package partykit
to predict survival probabilities based on a set of covariates X1,...,X25 and a treatment effect W. The linear predictor in each node in MOB only uses W, X1, and X2, but each covariate is used for selection for node splitting. I would like to force the MOB to only split according to parameter instability for the treatment effect W. When doing prediction in the final line of code below, I get the following error:
Error in rval[ix[[i]], ] <- preds[[i]] :
number of items to replace is not a multiple of replacement length
In addition: Warning messages:
1: 'newdata' had 1070 rows but variables found have 1029 rows
2: 'newdata' had 1337 rows but variables found have 1291 rows
3: 'newdata' had 1690 rows but variables found have 1680 rows
4: 'newdata' had 903 rows but variables found have 1000 rows
I believe this error occurs because the number of test observations falling in each terminal node is different than that of the training observations. How can I modify the predict statement to handle this issue and obtain predictions on the test set? I would also like to know if I'm using the parm
option correctly in specifying that parameter instability should be assessed according to W.
library("survival")
library("partykit")
n=5000;n.test=5000;p=25;pi=0.5;beta=1
gamma=0.5;rho=2;cen.scale=4;n.mc=10000;
Y.max=2
generate_data <- function(n, p, pi = 0.5, beta = 1, gamma = 1, rho = 2, cen.scale = 4,
Y.max = NULL){
W <- rbinom(n, 1, pi)
X <- matrix(rnorm(n * p), n, p)
numerator <- -log(runif(n))
cox.ft <- (numerator / exp(beta * X[ ,1] + (-0.5 - gamma * X[ ,2]) * W))^2
failure.time <- pmin(cox.ft, Y.max)
numeratorC <- -log(runif(n))
censor.time <- (numeratorC / (cen.scale ^ rho)) ^ (1 / rho)
Y <- pmin(failure.time, censor.time)
D <- as.integer(failure.time <= censor.time)
list(X = X, Y = Y, W = W, D = D)
}
data <- generate_data(n, p=p, pi = pi, beta = beta, gamma = gamma, rho = rho, cen.scale = cen.scale,
Y.max = Y.max)
data.test <- generate_data(n.test, p=p, pi = pi, beta = beta, gamma = gamma, rho = rho, cen.scale = cen.scale,
Y.max = Y.max)
X=data$X
Y=data$Y
W=data$W
D=data$D
var_prog <- c("X1","X2")
colnames(X) <- paste("X", 1:25, sep="")
cov.names <- colnames(X)
wbreg <- function(y, x, start = NULL, weights = NULL, offset = NULL, ...) {
survreg(y ~ 0 + x, weights = weights, dist = "weibull", ...)
}
dat <- data.frame(Y=Y,D=D,W=W,X)
eqn <- paste0("Surv(Y, D) ~ W + ",paste0(var_prog, collapse = "+")," | ",
paste0(cov.names, collapse = "+"))
glmtr <- partykit::mob(as.formula(eqn), data = dat,
fit = wbreg, control = mob_control(parm=2,minsize = 0.2*nrow(dat),
alpha = 0.10, bonferroni = TRUE))
plot(glmtr)
dat.test <- data.frame(Y=data.test$Y,D=data.test$D, W=data.test$W,data.test$X)
pct <- 1:98/100
quantile_pred <- predict(glmtr, newdata = dat.test, type = "quantile",p=pct)