1

I am trying to create a quadratic prediction line for a quadratic model. I am using the Auto dataset that comes with R. I had no trouble creating the prediction line for a linear model. However, the quadratic model yields crazy looking lines. Here is my code.

# Linear Model
plot(Auto$horsepower, Auto$mpg,
     main = "MPG versus Horsepower",
     pch = 20)

lin_mod = lm(mpg ~ horsepower,
             data = Auto)
lin_pred = predict(lin_mod)


lines(
  Auto$horsepower, lin_pred,
  col = "blue", lwd = 2
)


# The Quadratic model
Auto$horsepower2 = Auto$horsepower^2
quad_model = lm(mpg ~ horsepower2,
                data = Auto)
quad_pred = predict(quad_model)

lines(
  Auto$horsepower,
  quad_pred,
  col = "red", lwd = 2
)

I am 99% sure that the issue is the prediction function. Why can't I produce a neat looking quadratic prediction curve? The following code I tried does not work—could it be related?:

quad_pred = predict(quad_model, data.frame(horsepower = Auto$horsepower))

Thanks!

im2wddrf
  • 551
  • 2
  • 5
  • 19

3 Answers3

1

The issue is that the x-axis values aren't sorted. It wouldn't matter if was a linear model but it would be noticeable if it was polynomial. I created a new sorted data set and it works fine:

library(ISLR) # To load data Auto

# Linear Model
plot(Auto$horsepower, Auto$mpg,
     main = "MPG versus Horsepower",
     pch = 20)

lin_mod = lm(mpg ~ horsepower,
             data = Auto)
lin_pred = predict(lin_mod)


lines(
  Auto$horsepower, lin_pred,
  col = "blue", lwd = 2
)


# The Quadratic model
Auto$horsepower2 = Auto$horsepower^2

# Sorting Auto by horsepower2
Auto2 <- Auto[order(Auto$horsepower2), ]
quad_model = lm(mpg ~ horsepower2,
                data = Auto2)


quad_pred = predict(quad_model)


lines(
  Auto2$horsepower,
  quad_pred,
  col = "red", lwd = 2
)
  • Thank you! It looks like you created an entire new data frame just to sort the squared, horsepower values. Why can't I create the sorted values within the prediction function? For example, as I mentioned above, I get an error when I run the line: `predict(quad_model, data.frame(horsepower = sort(Auto$horsepower)))` . I coulda sworn using a data.frame() function as an argument in the prediction function would allow one to control the desired x values to predict on—hence sorting it there instead of creating an Auto2 data frame altogether. – im2wddrf Jul 22 '18 at 07:49
  • Most likely the error you will get is `Error in eval(predvars, data, env) : object 'horsepower2' not found`. You should change it to this: `quad_pred = predict(quad_model, data.frame(horsepower2 = sort(Auto$horsepower^2)))` – see-king_of_knowledge Jul 22 '18 at 16:02
1

One option is to create the sequence of x-values for which you would like to plot the fitted lines. This can be useful if your data has a "gap" or if you wish to plot the fitted lines outside of the range of the x-variables.

# load dataset; if necessary run install.packages("ISLR")
data(Auto, package = "ISLR")

# since only 2 variables at issue, use short names
mpg <- Auto$mpg
hp  <- Auto$horsepower

# fit linear and quadratic models
lmod <- lm(mpg ~ hp)
qmod <- lm(mpg ~ hp + I(hp^2))

# plot the data
plot(x=hp, y=mpg, pch=20)

# use predict() to find coordinates of points to plot
x_coords <- seq(from=floor(min(hp)), to=ceiling(max(hp)), by=1)
y_coords_lmod <- predict(lmod, newdata=data.frame(hp=x_coords))
y_coords_qmod <- predict(qmod, newdata=data.frame(hp=x_coords))

# alternatively, calculate this manually using the fitted coefficients
y_coords_lmod <- coef(lmod)[1] + coef(lmod)[2]*x_coords
y_coords_qmod <- coef(qmod)[1] + coef(qmod)[2]*x_coords + coef(qmod)[3]*x_coords^2

# add the fitted lines to the plot
points(x=x_coords, y=y_coords_lmod, type="l", col="blue")
points(x=x_coords, y=y_coords_qmod, type="l", col="red")
DanY
  • 5,920
  • 1
  • 13
  • 33
  • Thanks! This looks pretty intuitive. I like how you manually created a sequence. I think this will help smooth out a prediction line/curve. – im2wddrf Jul 22 '18 at 07:55
1

Alternatively, using ggplot2:

ggplot(Auto, aes(x = horsepower, y = mpg)) + geom_point() +
          stat_smooth(aes(x = horsepower, y = mpg), method = "lm", formula = y ~ x, colour = "red") +
          stat_smooth(aes(x = horsepower, y = mpg), method = "lm", formula = y ~ poly(x, 2), colour = "blue")
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • Thank you! I appreciate showing an alternative method on ggplot. It looks real neat. I really like the prediction interval band. – im2wddrf Jul 22 '18 at 07:53