-5

How do I calculate the analysis of variance among different groups and see if it is larger compared to the variance within each group, in R.

    M1      M2      M3      M4      M5      M6      M7                          
1   G1      G2      G3      G4      G5      G6      G7                          
2   20.49   22.94   23.06   16.9   16.72    20.65   21.66                           
3   23.62   22.15   20.05   22.48   19.32   18.79   20.37                           
4   20.51   21.16   22.47   22.48   25.66   21.25   21.93                           
5   15.09   20.98   13.9    19.79   20.74   14.05   20.14                           
6   21.75   21.11   19.32   19.56   25.82   18.39   20.23       

This is what I did. Is this correct?

g1<- c(20.49,23.62,20.51,15.09,21.75)
g2<-c(22.94,22.15,21.16,20.98,21.11)
g3<-c(23.06,20.05,22.47,13.9,19.32)
g4<-c(16.9,22.48,22.48,19.79,19.56)
g5<-c(16.72,19.32,25.66,20.74,25.82)
g6<-c(20.65,18.79,21.25,14.05,18.39)
g7<-c(21.66,20.37,21.93,20.14,20.23)
Combined_g<-data.frame(cbind(g1,g2,g3,g4,g5,g6,g7))
stacked_g<- stack(Combined_g)
Anova_results<- aov(values ~ ind, data = stacked_g)
summary(Anova_results)
Df Sum Sq Mean Sq F value Pr(>F)
ind          6  34.86   5.810    0.75  0.615
Residuals   28 216.92   7.747 
Mat Vicky
  • 13
  • 3
  • Try to specify what you have done for solving your problem and where you are stuck – lrleon Oct 13 '18 at 18:31
  • I have put them in a stacked format, so the values are all in one line and their corresponding groups next to them, I did the aov fucntion calculation too, but I don't know how to calculate for the variance for two differnt variances (among and within). Or is it already calculated by the aov function automatically? – Mat Vicky Oct 13 '18 at 19:20
  • Hi Mat, and Welcome to SO! Please keep in mind that SO is not a code writing service so you are expected to provide the code you have tried yourself. Otherwise you risk getting your posts down-voted or closed. – not2qubit Oct 13 '18 at 19:28

2 Answers2

1

You need to have your data in long format, such as:

value factor 
20.49 G1
23.62 G1
...
22.94 G2
...
20.23 G7

and then you can use function aov

fit <- aov(value ~ factor, data=yourdataframe) 
summary(fit)
jyr
  • 690
  • 6
  • 20
0

A complete answer to the question, including the correct package to convert the wide format data to a narrow format tidy data set, is as follows.

First, load the data into a data frame & use tidyr::gather() to convert to narrow format.

rawData <- "G1      G2      G3      G4      G5      G6      G7                          
  20.49   22.94   23.06   16.9   16.72    20.65   21.66                           
   23.62   22.15   20.05   22.48   19.32   18.79   20.37                           
   20.51   21.16   22.47   22.48   25.66   21.25   21.93                           
   15.09   20.98   13.9    19.79   20.74   14.05   20.14                           
   21.75   21.11   19.32   19.56   25.82   18.39   20.23 "

data <- read.table(text=rawData,header=TRUE,stringsAsFactors=TRUE)
library(tidyr) # needed to convert to narrow format tidy data
narrowData <- gather(data,key="group")

After conversion, print the first few rows.

> head(narrowData)
  group value
1    G1 20.49
2    G1 23.62
3    G1 20.51
4    G1 15.09
5    G1 21.75
6    G2 22.94
> 

Now, use the aov() function to produce an analysis of variance and print the model summary statistics.

aovModel <- aov(value ~ group,data=narrowData)
aovModel
summary(aovModel)

...and the output:

> aovModel
Call:
   aov(formula = value ~ group, data = narrowData)

Terms:
                    group Residuals
Sum of Squares   34.86206 216.92436
Deg. of Freedom         6        28

Residual standard error: 2.783397
Estimated effects may be unbalanced
> summary(aovModel)
            Df Sum Sq Mean Sq F value Pr(>F)
group        6  34.86   5.810    0.75  0.615
Residuals   28 216.92   7.747               
> 

Interpreting the results

Analysis of Variance tests the following hypotheses:

  • Null hypothesis: mean(group1) = mean(group2) = ... = mean(group7)
  • Alternate hypothesis: means are not equal across all groups

If we were willing to accept a 5% chance of a Type 1 error (rejecting a null hypothesis when it is indeed true), we would set the rejection region at p = 0.05.

Since the p-value for the F test for homogeneity of variances is greater than 0.05, we fail to reject the null hypothesis that the means are equal.

Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • Thank you so much! That helped a lot. And since the p value is greater than alpha= 0.05, the variances are not significantly different right? or is the other way around? – Mat Vicky Oct 14 '18 at 03:37
  • @MatVicky - I added a section on interpreting the results to my answer. If it was helpful, please accept the answer and upvote it. Thanks! – Len Greski Oct 14 '18 at 13:17
  • The interpretation makes sense! thanks for the help again. – Mat Vicky Oct 14 '18 at 19:42
  • @MatVicky - if the answer helped, please accept it. Thanks. – Len Greski Oct 14 '18 at 21:36