0

I need some help in performing N-way ANOVA in R to capture inter dependencies among different factors. In my data, there are around 100 different factors and I am using the following code to perform ANOVA.

model.lm<-lm(y~., data=data)
anova(model.lm)

As far as I know (may be I am wrong) that this performs 1-way ANOVA at each factor alone. For some reasons, I need to perform N-way ANOVA between all the 100 groups i.e from x1 to x100. Do I need to specify each factor like the following or there is a shorthand notation for this?

model.lm<-lm(y~x1*x2*x3....,x100, data=data)
anova(model.lm)
Shahzad
  • 1,999
  • 6
  • 35
  • 44
  • 2
    You need to clarify what you mean by a `n-way` ANOVA. The `lm(y~.` fill fit all the factors as *main effects* with no interactions, not as individual models as your wording suggests you think. Are you wanting to fit all interactions between the 100 factors (I really hope the answer to this is no.) – mnel Nov 26 '12 at 00:59
  • I think you are incorrect, from what I understand of your question the `y~.` should be what you're after. – Tyler Rinker Nov 26 '12 at 01:00
  • @TylerRinker Not really? That notation doesn't take care of interactions - which is what mnel's comment is asking about. (And I also hope that they don't want to fit all 2-way, 3-way, ..., 100-way interactions as well...) – Dason Nov 26 '12 at 01:01
  • All the one way interactions on 100 variables would be `sum(choose(100, 1:2))` if my thinking is correct. Probably not what you're after (I too hope); and all the interactions possible?? Better brew a big pot of coffee. – Tyler Rinker Nov 26 '12 at 01:02
  • @mnel. Yes, I want to fit all interactions between 100 factors. May be this could be exponential in time but this should be possible in some sense. – Shahzad Nov 26 '12 at 01:05
  • 2
    @Shahzad How much data do you have? You realize that you're essentially fitting a model with 2^100 parameters. If you don't have at least 2^100 + 1 data points then you're fitting a very saturated model... – Dason Nov 26 '12 at 01:10
  • 1
    And if you really do need to fit a model with that many parameters and somehow have enough data to fit that model then you'll have to use special functions to actually do the fitting because you can't allocate enough memory to hold that much data in memory to make `lm` work. Then there's also the questionable assumptions that go along with fitting that model... – Dason Nov 26 '12 at 01:18
  • Also I just realized I was assuming that each of your factors only had 2 levels each. If there are more levels in each factor then your model is even more ridiculous! – Dason Nov 26 '12 at 02:17
  • @Dason. Yes. It has only 2 levels "1" and "2". – Shahzad Nov 26 '12 at 02:19
  • Ok. Well... good luck I guess. – Dason Nov 26 '12 at 02:19

1 Answers1

4

You can use update.formula and the ~(.)^n notation.

Eg for a model including 3-way interactions from 4 variables a, b, c and d

update(~a+b+c+d, ~(.)^3)


## ~a + b + c + d + a:b + a:c + a:d + b:c + b:d + c:d + a:b:c + a:b:d + a:c:d + b:c:d

So for your example where you want to fit 100-way interactions, I would suggest thinking of a more appropriate model (especially if it is time you are accounting for here).

If you decide to continue with the basic ANOVA approach you could do something like this (and wait for R to crash due having memory issues due to your large data / inappropriate model.)

xvars <- paste0('x',1:100)
oneway <- reformulate(termlabels=  xvars, response = 'y')


horribleformula <- update(oneway, . ~ (.)^100)

horriblemodel <- lm(horribleformula, data=data)

Or (thanks to @Dason for picking this up)

 stillhorrible <- lm(y ~ .^100, data = data)
mnel
  • 113,303
  • 27
  • 265
  • 254