4

I have the coefficients from a glm fitted in R, and I want to predict expected values for a new set of data. If I had the model object this would be simple, using predict(). However, I am now offsite and for data confidentiality reasons I no longer have the model object. I have only the summary object, generated using summary(model), which contains the model coefficients.

It's easy enough to use the coefficients to predict expected values for a simple model. However, I would like to know how to do this when the model includes a cubic spline ns(). Any shortcuts for when the model also includes categorical variables would be appreciated as well.

Here is a simple example.

library(splines)
dat <- data.frame(x=1:500, z=runif(500), k=as.factor(sample(c("a","b"), size=500, replace=TRUE)))
kvals <- data.frame(kn=c("a","b"),kv=c(20,30))
dat$y = dat$x + (40*dat$z)^2 + kvals$kv[match(dat$k,kvals$kn)] + rnorm(500,0,30)
# Fit model
library(splines)
mod <- glm(y ~ x + ns(z,df=2) + k,data=dat)
# Create new dataset
dat.new <- expand.grid(x=1:3,z=seq(0.2,0.4,0.1),k="b")
# Predict expected values in the usual way
predict(mod,newdata=dat.new)
summ <- summary(mod)
rm(mod)
# Now, how do I predict using just the summary object and dat.new?
IRTFM
  • 258,963
  • 21
  • 364
  • 487
SDH
  • 43
  • 4

1 Answers1

2

There's probably a more efficient method to tackle this, but here's a starting point to get you set up to implement the strategy the Roland briefly suggested. The summ object has the information necessary to define the spline function, but it's kind of buried:

    names(summ)
     [1] "call"           "terms"          "family"         "deviance"       "aic"           
     [6] "contrasts"      "df.residual"    "null.deviance"  "df.null"        "iter"          
    [11] "deviance.resid" "coefficients"   "aliased"        "dispersion"     "df"            
    [16] "cov.unscaled"   "cov.scaled"    

And looking at the structure of the terms leaf, we see that the spline detail is buried deeper still inside the predvars subleaf:

str(summ$terms)
Classes 'terms', 'formula'  language y ~ x + ns(z, df = 2) + k
  ..- attr(*, "variables")= language list(y, x, ns(z, df = 2), k)
  ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:4] "y" "x" "ns(z, df = 2)" "k"
  .. .. ..$ : chr [1:3] "x" "ns(z, df = 2)" "k"
  ..- attr(*, "term.labels")= chr [1:3] "x" "ns(z, df = 2)" "k"
  ..- attr(*, "order")= int [1:3] 1 1 1
  ..- attr(*, "intercept")= int 1
  ..- attr(*, "response")= int 1
  ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  ..- attr(*, "predvars")= language list(y, x, ns(z, knots = structure(0.514993450604379, .Names = "50%"), Boundary.knots = c(0.00118412892334163,  0.99828373757191), intercept = FALSE), k)
  ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "nmatrix.2" "factor"
  .. ..- attr(*, "names")= chr [1:4] "y" "x" "ns(z, df = 2)" "k"

So pull the attribute out:

str(attributes(summ$terms)$predvars)
 language list(y, x, ns(z, knots = structure(0.514993450604379, .Names = "50%"),
               Boundary.knots = c(0.00118412892334163,  0.99828373757191), intercept = FALSE), k)

You can see that it is possible to recover the spline if you supply the x,y,z, and k values needed:

with(dat, ns(z, knots = 0.514993450604379, Boundary.knots = c(0.00118412892334163, 
 0.99828373757191), intercept = FALSE) )
#---
                 1             2
  [1,] 5.760419e-01 -1.752762e-01
  [2,] 2.467001e-01 -1.598936e-01
  [3,] 4.392684e-01  4.799757e-01
snipping ....
[498,] 4.965628e-01 -2.576437e-01
[499,] 5.627389e-01  1.738909e-02
[500,] 2.393920e-02 -1.611872e-02
attr(,"degree")
[1] 3
attr(,"knots")
[1] 0.5149935
attr(,"Boundary.knots")
[1] 0.001184129 0.998283738
attr(,"intercept")
[1] FALSE
attr(,"class")
[1] "ns"     "basis"  "matrix"

You can build a substitute dat, if you know the extremes of your data. See ?ns and the other help pages it links to.

IRTFM
  • 258,963
  • 21
  • 364
  • 487