I could not find a simple function to do this. There is some code in the predict
function that depends on having a fitted model (like determining the rank of the model). However, we can create a function to make a fake glm object that you can use with predict. Here's my first attempt at such a function
makeglm <- function(formula, family, data=NULL, ...) {
dots <- list(...)
out<-list()
tt <- terms(formula, data=data)
if(!is.null(data)) {
mf <- model.frame(tt, data)
vn <- sapply(attr(tt, "variables")[-1], deparse)
if((yvar <- attr(tt, "response"))>0)
vn <- vn[-yvar]
xlvl <- lapply(data[vn], function(x) if (is.factor(x))
levels(x)
else if (is.character(x))
levels(as.factor(x))
else
NULL)
attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)]
attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass)
}
out$terms <- tt
coef <- numeric(0)
stopifnot(length(dots)>1 & !is.null(names(dots)))
for(i in seq_along(dots)) {
if((n<-names(dots)[i]) != "") {
v <- dots[[i]]
if(!is.null(names(v))) {
coef[paste0(n, names(v))] <- v
} else {
stopifnot(length(v)==1)
coef[n] <- v
}
} else {
coef["(Intercept)"] <- dots[[i]]
}
}
out$coefficients <- coef
out$rank <- length(coef)
out$qr <- list(pivot=seq_len(out$rank))
out$family <- if (class(family) == "family") {
family
} else if (class(family) == "function") {
family()
} else {
stop(paste("invalid family class:", class(family)))
}
out$deviance <- 1
out$null.deviance <- 1
out$aic <- 1
class(out) <- c("glm","lm")
out
}
So this function creates an object and passes all the values that predict
and print
expect to find on such an object. Now we can test it out. First, here's some test data
set.seed(15)
dd <- data.frame(
X1=runif(50),
X2=factor(sample(letters[1:4], 50, replace=T)),
X3=rpois(50, 5),
Outcome = sample(0:1, 50, replace=T)
)
And we can fit a standard binomial model with
mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial)
Which gives
Call: glm(formula = Outcome ~ X1 + X2 + X3, family = binomial, data = dd)
Coefficients:
(Intercept) X1 X2b X2c X2d X3
-0.4411 0.8853 1.8384 0.9455 1.5059 -0.1818
Degrees of Freedom: 49 Total (i.e. Null); 44 Residual
Null Deviance: 68.03
Residual Deviance: 62.67 AIC: 74.67
Now let's say we wanted to try out model that we read in a publication on the same data. Here's how we use the makeglm
function
newmodel <- makeglm(Outcome~X1+X2+X3, binomial, data=dd,
-.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15)
The first parameter is the formula of the model. This defines the response and the covariates just like you would when running glm
. Next you specify the family like you would with glm()
. And you need to pass a data frame so R can sniff the correct data types for each of the variables involved. This will also identify all of the factor variables and their levels using the data.frame. So this can be new data that's coded just like the fitted data.frame or it can be the original one.
Now we start to specify the coefficients to use in our model. The coefficients will be filled using the names of the parameters. The unnamed parameter will be used as the intercept. If you have a factor, you need to give coefficients to all the levels via named parameters. Here I just decided to round off the fitted estimates to "nice" numbers.
Now I can use our newmodel
with predict.
predict(mymodel, type="response")
# 1 2 3 4 5
# 0.4866398 0.3553439 0.6564668 0.7819917 0.3008108
predict(newmodel, newdata=dd, type="response")
# 1 2 3 4 5
# 0.5503572 0.4121811 0.7143200 0.7942776 0.3245525
Here I call predict on the original model and on the new model using the old data with my specified coefficients. We can see the estimate of probability have changed a bit.
Now I haven't thoroughly tested this function so use at your own risk. I don't do as much error checking as I probably should. Maybe someone else does know of a better way.