0

For the following data:

require(dplyr)
require(ggplot2)

ds <- read.table(header = TRUE, text ="
obs  id year attend
                 1  47 2000      1
                 2  47 2001      3
                 3  47 2002      5
                 4  47 2003      8
                 5  47 2004      6
                 6  47 2005      4
                 7  47 2006      2
                 8  47 2007      1
                 9  47 2008      2
                 10 47 2009      3
                 11 47 2010      4
                 12 47 2011      5

                 ")
print(ds)

I would like to compute predicted values of linear models

linear<-    predict(lm(attend ~ year, ds))
quadratic<- predict(lm(attend ~ year + I(year^2),ds))
cubic<-     predict(lm(attend ~ year + I(year^2) + I(year^3),ds))

ds<- ds %>% dplyr::mutate(linear=linear, quadratic=quadratic, cubic=cubic)
print(ds)

   obs id year attend   linear quadratic    cubic
1    1 47 2000      1 3.820513  3.500000 3.500000
2    2 47 2001      3 3.792541  3.646853 3.646853
3    3 47 2002      5 3.764569  3.758741 3.758741
4    4 47 2003      8 3.736597  3.835664 3.835664
5    5 47 2004      6 3.708625  3.877622 3.877622
6    6 47 2005      4 3.680653  3.884615 3.884615
7    7 47 2006      2 3.652681  3.856643 3.856643
8    8 47 2007      1 3.624709  3.793706 3.793706
9    9 47 2008      2 3.596737  3.695804 3.695804
10  10 47 2009      3 3.568765  3.562937 3.562937
11  11 47 2010      4 3.540793  3.395105 3.395105
12  12 47 2011      5 3.512821  3.192308 3.192308

Question: Despite the fact that time series has a clear cubical shape, the quadratic and cubic predictions are identical. Why? Is this a mistake?

rcs
  • 67,191
  • 22
  • 172
  • 153
andrey
  • 2,029
  • 2
  • 18
  • 23
  • 1
    Yet another error that might have been prevented by using `poly()`. When will they ever learn. – IRTFM Jun 25 '14 at 06:16

1 Answers1

6

This is due to the fact 2011^3 is a very big number (greater tha and this is causing the coeffiicent to be returned as NA. If you had inspected the models, you would have noticed this.

coef(lm(attend ~ year + I(year^2) + I(year^3),ds))
#   (Intercept)          year     I(year^2)     I(year^3) 
# -7.025524e+04  7.009441e+01 -1.748252e-02            NA 

It is more sensible to use poly to create orthogonal polynomials

linear<-    predict(lm(attend ~ year, ds))
quadratic<- predict(lm(attend ~ poly(year,2),ds))
cubic<-     predict(lm(attend ~ poly(year,3),ds))


ds<- (ds %>% dplyr::mutate(linear=linear, quadratic=quadratic, cubic=cubic))
ds
# obs id year attend   linear quadratic     cubic
# 1    1 47 2000      1 3.820513  3.500000 0.7435897
# 2    2 47 2001      3 3.792541  3.646853 3.8974359
# 3    3 47 2002      5 3.764569  3.758741 5.5128205
# 4    4 47 2003      8 3.736597  3.835664 5.9238539
# 5    5 47 2004      6 3.708625  3.877622 5.4646465
# 6    6 47 2005      4 3.680653  3.884615 4.4693085
# 7    7 47 2006      2 3.652681  3.856643 3.2719503
# 8    8 47 2007      1 3.624709  3.793706 2.2066822
# 9    9 47 2008      2 3.596737  3.695804 1.6076146
# 10  10 47 2009      3 3.568765  3.562937 1.8088578
# 11  11 47 2010      4 3.540793  3.395105 3.1445221
# 12  12 47 2011      5 3.512821  3.192308 5.9487179
mnel
  • 113,303
  • 27
  • 265
  • 254