I'm learning R and trying to understand how lm()
handles factor variables & how to make sense of the ANOVA table. I'm fairly new to statistics, so please be gentle with me.
Here's some movie data from Rotten Tomatoes. I'm trying to model the score of each movie based on the mean scores for all of the movies in 4 groups: those rated G, PG, PG-13, and R.
download.file("http://www.rossmanchance.com/iscam2/data/movies03RT.txt", destfile = "./movies.txt")
movies <- read.table("./movies.txt", sep = "\t", header = T, quote = "")
lm1 <- lm(movies$score ~ as.factor(movies$rating))
anova(lm1)
and the ANOVA output:
## Analysis of Variance Table
##
## Response: movies$score
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(movies$rating) 3 570 190 0.92 0.43
## Residuals 136 28149 207
I understand how to get all the numbers in this table, EXCEPT Sum Sq
and Mean Sq
for as.factor(movies$rating)
. Can someone please explain how that Sum Sq
is calculated from my data? I know that Mean Sq
is just Sum Sq
divided by Df
.