0

I am dealing with a dataset that involves 9 different genotypes, arranged into 3 different classes. About 20 size measurements from each genotype have been recorded.

I tried running a two-way anova (after a one-way anova established that size differed significantly between genotypes), in order to analyse the differences between the 3 classes as well as the different genotypes.

I used the function aov(size~genotype*class,data=x) The summary table I obtained only has a row for genotype, and I can't see class or genotype:class anywhere, as I would have expected. The table I obtain is identical to the one I get when I just run aov(size~genotype, data=x)

What have I done wrong? Even if the class/class:genotype wouldn't change the significance of the results, they should still show up in the anova summary table?

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • 1
    Yes, they normally should; run the example in `?aov`. Which means that without you producing an example data set we will not be able to help you. – January Nov 19 '12 at 11:34
  • I'm sorry I don't understand what you mean by producing an example data set? – user1835550 Nov 19 '12 at 11:38
  • Something that I would be able to run and reproduce your error. Either give me your original data, or create some fake data that produces the same problem. I do not have your problem when I run `aov()` on my data, you see. – January Nov 19 '12 at 11:40
  • I have noticed that each genotype is associated with a different amount of measurements (eg some have 13, some have 26)...could this be causing the issue? Should I work with means instead? – user1835550 Nov 19 '12 at 11:54
  • No, you should provide us with example data instead. – January Nov 19 '12 at 12:14

1 Answers1

1

In short - you can't fit the model you're trying to fit... at least if I'm understanding your data correctly. My understanding is that you have something similar to this:

dat <- data.frame(size = rnorm(27), genotype = gl(9,3), class = gl(3, 9))

> dat <- data.frame(size = rnorm(27), genotype = gl(9,3), class = gl(3, 9))
> dat
          size genotype class
1   1.44189249        1     1
2   1.05766532        1     1
3   0.08133568        1     1
4   0.36642288        2     1
5   0.93266571        2     1
6  -0.64031787        2     1
7   0.33361892        3     1
8   0.53315507        3     1
9   0.26851394        3     1
10  0.05062280        4     2
11 -0.30924511        4     2
12 -0.61460429        4     2
13 -0.18901238        5     2
14  0.58881858        5     2
15  0.58625502        5     2
16  0.52002793        6     2
17  1.23862937        6     2
18 -2.02333160        6     2
19 -0.09918607        7     3
20  0.65947932        7     3
21 -0.65440238        7     3
22  0.10923036        8     3
23  0.76845484        8     3
24 -0.24804574        8     3
25 -0.30890950        9     3
26 -2.82056870        9     3
27  0.56828147        9     3

(The main thing I'm looking at is how genotype and class relate - not the actual values for size or the sample sizes for each genotype*class combination)

If each genotype is entirely contained within a single class then you can't separate the genotype effect from the class effect. Hopefully this makes sense to you - if not let me illustrate with a smaller example. First off - since each genotype is entirely in one class we can't fit an interaction - that just doesn't make sense at all. The interaction would be useful if genotypes could be a part of at least two classes because it would allow us to attribute a different effect for genotype based on the class it's in for the observation. But since each genotype is only in one class... fitting a model with interactions is out.

Now to see why we can't fit a class effect just consider class 1 which contains genotypes 1-3. The thing to recognize is that with linear models (and ANOVA is just a special case of a linear model) the thing we're modeling is the conditional means in the different groups - and we try to partition this into certain effects if possible. So any model that gives us the same group means is essentially equivalent. Pretend for a second that the effect for class 1 is c, and the effects for genotypes 1-3 are x, y, and z (respectively). Then the value for the group genotype1/class1 = c+x, for genotype2/class1 = c+y, for genotype3/class1 = c+z. But notice here that we could just as easily said the class1 effect is 0 and then said the effects for genotypes 1-3 are c+x, c+y, c+z (respectively). So class is completely useless in this situation. There is no way to separate the class effect since the genotypes are completely nested inside of class. So we can only fit a model that has separate effects for genotypes if we want to fit a completely fixed effects model.

Dason
  • 60,663
  • 9
  • 131
  • 148
  • This helped very much! Thank YoU! Is there a model that you could suggest that takes into account both Genome and Class? Or will the result always be equivalent to what I have obtained? – user1835550 Nov 21 '12 at 19:08
  • Or, to be more precise, is there a way that I could test how the classes are different AND how the genotypes in every class differ? – user1835550 Nov 21 '12 at 21:25
  • Possibly some sort of random effects model but that is more of a stats question and less of a programming question. You would better off asking something like that at http://stats.stackexchange.com/ – Dason Nov 23 '12 at 04:24