0

although I've found plenty of help regarding fitting models in general, I keep running into a specific issue with my data because of the way it's organized. It's from an intro stats book and is supposed to represent sample data of errors as a function of milligrams of some drug.

|-----|-------|-------|-------|
| 0mg | 100mg | 200mg | 300mg |
|-----|-------|-------|-------|
| 25  |  16   |   6   |   8   |
| 19  |  15   |  14   |  18   |
| 22  |  19   |   9   |   9   |
| 15  |  11   |   5   |  10   |
| 16  |  14   |   9   |  12   |
| 20  |  23   |  11   |  13   |

The data looks like it dips around group C, then goes up a bit for D, hence looking for a quadratic fit.

I've tried the following:

scores = c(25, 19, 22, 15, 16, 20,
           16, 15, 19, 11, 14, 23,
            6, 14,  9,  5,  9, 11,
            8, 18,  9, 10, 12, 13)

x_groups = rep(c(0,100, 200, 300), each = 6)
scores.quadratic = lm(scores ~ poly(x_groups, 2, raw = TRUE))

I can then use the summary() function to view the results. I'm confused about the lm() function and how it's supposed to fit a quadratic function. My understanding is that it will take each index in x_groups and square that, then use a linear fit with that new vector, but that doesn't seem correct to me.

Can someone provide feedback on how this is supposed to fit a quadratic to my data, or if it's not doing that, please help me understand where I'm going wrong.

Thank you.

  • 1
    Quadratic is a specific case of a polynomial formula, but it has order = 2. A polynomial fit with order = n of a variable `x` will fit `intercept + x + x^2 + x^3 + ... + x^n`. Therefore, the quadratic will fit `intercept + x + x^2` and that's exactly the coefficients you get in your model's output. It looks like you expected it to be `intercept + x^2`. – AntoniosK Dec 09 '17 at 09:35

1 Answers1

2

Let's go through your way of thinking step by step. First, you spot this dip via your numbers for group C. The best way to visualise this is

library(ggplot2)
library(dplyr)

scores = c(25, 19, 22, 15, 16, 20,
           16, 15, 19, 11, 14, 23,
           6, 14,  9,  5,  9, 11,
           8, 18,  9, 10, 12, 13)

x_groups = rep(c(0,100, 200, 300), each = 6)

# create dataset
d1 = data.frame(scores, x_groups) 

# calcuate average scores for each group
d2 = d1 %>% group_by(x_groups) %>% summarise(Avg = mean(scores))

# plot them
ggplot() + 
  geom_point(data = d1, aes(x_groups, scores)) +
  geom_line(data = d2, aes(x_groups, Avg), col="blue")

enter image description here

Now you can actually see the dip and that's the pattern you want to model.

Then, you want to fit your quadratic model. Keep in mind that quadratic is a specific case of a polynomial formula, but it has order = 2. A polynomial fit with order = n of a variable x will fit intercept + x + x^2 + x^3 + ... + x^n. Therefore, the quadratic will fit intercept + x + x^2 and that's exactly the coefficients you get in your model's output:

scores.quadratic = lm(scores ~ poly(x_groups, 2, raw = TRUE))
summary(scores.quadratic)

# Call:
#   lm(formula = scores ~ poly(x_groups, 2, raw = TRUE))
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -6.1250 -2.3333 -0.2083  1.8542  8.7917 
# 
# Coefficients:
#                                    Estimate Std. Error t value Pr(>|t|)    
#   (Intercept)                    20.2083333  1.5925328  12.689 2.58e-11 ***
#   poly(x_groups, 2, raw = TRUE)1 -0.0745833  0.0255747  -2.916  0.00825 ** 
#   poly(x_groups, 2, raw = TRUE)2  0.0001458  0.0000817   1.785  0.08870 .  
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 4.002 on 21 degrees of freedom
# Multiple R-squared:  0.4999,  Adjusted R-squared:  0.4523 
# F-statistic:  10.5 on 2 and 21 DF,  p-value: 0.0006919

The coefficient of the quadratic term is 0.0001458 ,close to zero , but statistically significantly different than zero at a 0.1 level (p value = 0.08870). Therefore, the model kind of feels that there's a dip.

You can plot the fit like this:

# plot the model
ggplot(d1, aes(x_groups, scores)) + 
  geom_point() +
  geom_smooth(formula = y ~ poly(x, 2, raw = TRUE),
              method = "lm")

You can see this as a smoothed version of the real pattern (1st plot).

enter image description here

AntoniosK
  • 15,991
  • 2
  • 19
  • 32