If you want to compare 3 data sets, you have to organize this in form of a two column data frame, e.g. the dependend variable (y
) and the grouping variable (group
). Two vectors are also possible, but a data frame has the advantage that you can easily see the relationship. In addition, it is a good idea to encode the grouping variable as a factor
. More can be found in the statistics and R textbooks.
first <- c(30000, 34000, 36000, 38000, 40000)
third <- c(30000, 35000, 37000, 38000, 40000)
fifth <- c(40000, 41000, 43000, 44000, 50000)
# organize the data and the grouping variable as a data frame
mydata <- data.frame(
y = c(first, third, fifth),
group = factor(rep(c("first", "third", "fifth"), each=5))
)
## show structure of the data
mydata
## fit linear model and perform anova
m <- lm(y ~ group, data=mydata)
anova(m)
## don't forget diagnostics
par(mfrow=c(2, 2))
plot(m)
The result of anova(m)
is then indeed:
> anova(m)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
group 2 203200000 101600000 6.8341 0.01044 *
Residuals 12 178400000 14866667
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1