R: lm() with factors. Don't understand how ANOVA table calculates "Sum Sq"

Question

I'm learning R and trying to understand how lm() handles factor variables & how to make sense of the ANOVA table. I'm fairly new to statistics, so please be gentle with me.

Here's some movie data from Rotten Tomatoes. I'm trying to model the score of each movie based on the mean scores for all of the movies in 4 groups: those rated G, PG, PG-13, and R.

download.file("http://www.rossmanchance.com/iscam2/data/movies03RT.txt", destfile = "./movies.txt")
movies <- read.table("./movies.txt", sep = "\t", header = T, quote = "")
lm1 <- lm(movies$score ~ as.factor(movies$rating))
anova(lm1)

and the ANOVA output:

## Analysis of Variance Table
## 
## Response: movies$score
##                           Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(movies$rating)   3    570     190    0.92   0.43
## Residuals                136  28149     207

I understand how to get all the numbers in this table, EXCEPT Sum Sq and Mean Sq for as.factor(movies$rating). Can someone please explain how that Sum Sq is calculated from my data? I know that Mean Sqis just Sum Sq divided by Df.

I think this question is more appropriate for http://stats.stackexchange.com/ — NPE, Feb 13 '13 at 17:59

score 1 · Answer 1 · answered Feb 13 '13 at 18:09

1

There are various ways to get that. One of them is to use the equation:

http://en.wikipedia.org/wiki/Sum_of_squares_(statistics)

SS_total = SS_reg + SS_error

So:

y = movies$score
sum((y - mean(y))^2) - sum(lm1$residuals^2)

answered Feb 13 '13 at 18:09

liuminzhao

2,385
17
28

Please note that there is an ongoing quasi-religious war (at least a disagreement with vehemently argued opposing positions) between SAS and R authors regarding how to properly construct and partition sums-of-squares. – IRTFM Feb 13 '13 at 21:23

R: lm() with factors. Don't understand how ANOVA table calculates "Sum Sq"

1 Answers1