0

I'm getting NAN values in my ANOVA table when I run this code. I believe the factors for column 'V3' are improperly sorted. Is that the issue?

I've also tried the OLS library in statsmodel (for python) but I was also getting some error about NaNs and infinite values.

data <- read.csv(file = 'dogs2.csv',header=FALSE, sep=",")
data
V1  V2  V3
0.28 Dog 1   Isofluorane
0.3 Dog 1   Halothane
1.07    Dog 1   Cyclopropane
0.51    Dog 2   Isofluorane
0.39    Dog 2   Halothane
1.35    Dog 2   Cyclopropane
1   Dog 3   Isofluorane
0.63    Dog 3   Halothane
0.69    Dog 3   Cyclopropane
0.39    Dog 4   Isofluorane
0.68    Dog 4   Halothane
0.28    Dog 4   Cyclopropane
0.29    Dog 5   Isofluorane
0.38    Dog 5   Halothane
1.24    Dog 5   Cyclopropane
0.36    Dog 6   Isofluorane
0.21    Dog 6   Halothane
1.53    Dog 6   Cyclopropane
0.32    Dog 7   Isofluorane
0.88    Dog 7   Halothane
0.49    Dog 7   Cyclopropane
0.69    Dog 8   Isofluorane
0.39    Dog 8   Halothane
0.56    Dog 8   Cyclopropane
0.17    Dog 9   Isofluorane
0.51    Dog 9   Halothane
1.02    Dog 9   Cyclopropane
0.33    Dog 10  Isofluorane
0.32    Dog 10  Halothane
0.3 Dog 10  Cyclopropane
anova(lm(as.numeric(data$V1) ~ as.factor(data$V2) * as.factor(data$V3), data))
Warning message in anova.lm(lm(as.numeric(data$V1) ~ as.factor(data$V2) * as.factor(data$V3), :
"ANOVA F-tests on an essentially perfect fit are unreliable"
Df  Sum Sq  Mean Sq F value Pr(>F)
as.factor(data$V2)  9   268.9667    29.88519    NaN NaN
as.factor(data$V3)  2   168.4667    84.23333    NaN NaN
as.factor(data 2):.( V3)  18  827.5333    45.97407    NaN NaN
Residuals   0   0.0000  NaN NA  NA

I'm not sure why the F statistics are NaN.

EDIT: The ANOVA table is complete when I get rid of interaction from the model and use 'V2 + V3' instead of 'V2 * V3'. However, I'm certain that I want to measure interaction between these two variables.

E. Kaufman
  • 129
  • 2
  • 11
  • 2
    Possible duplicate of [R: In anova.lm(g) : ANOVA F-tests on an essentially perfect fit are unreliable](https://stackoverflow.com/questions/8550288/r-in-anova-lmg-anova-f-tests-on-an-essentially-perfect-fit-are-unreliable) – Sang won kim Apr 03 '19 at 05:47
  • 1
    I think maybe it has something to do with the fact that there is only one observation in each cell. Notice how each dog gets each treatment exactly once. – E. Kaufman Apr 03 '19 at 06:28

1 Answers1

0

Answer :

aov(lm(as.numeric(data$V1) ~ as.factor(data$V2) * as.factor(data$V3), data))

One-way anova

Sang won kim
  • 524
  • 5
  • 21
  • We need a way to make R recognize the data as a block design experiment where individual subjects undergo multiple treatments in random order. Oh and didn't you mean '+' instead of '*' ? – E. Kaufman Apr 03 '19 at 15:10
  • It seems that the F statistic is equal to ( SS_V? / ( # of levels in V? - 1 ) ) / ( SS_Error / ( # of levels in V1 * # levels in V2 * (# of observations in each of combination in each cell - 1) ) ) . However since there is only 1 observation in each cell, we get a divide by zero error. – E. Kaufman Apr 03 '19 at 15:42
  • How do we get R to set the denominator of the F statistic to SS_(V2:V3) / ( ( # of levels in V1 - 1 ) * ( # levels in V2 -1 ) ) instead of what it currently is? – E. Kaufman Apr 03 '19 at 15:56
  • Two ways to mind. If you want to look at the statistical results in detail, you can look at the function's built-in code or split the data. I'm sorry if it did not help. – Sang won kim Apr 04 '19 at 02:53
  • It's been a while, but yeah each treatment cell has exactly one observation which is a no-no. If `dof(Error) = n - a*b` where `a = 10` and `b = 3`, then of course we'll get a divide by zero error when computing the F statistic for interaction. We need some replication like [this guy](https://stackoverflow.com/a/8550335/7854570) says so that we can increase `n` – E. Kaufman Nov 14 '22 at 03:24