0

i'm not sure that this is the perfect place for such a question but maybe you can help me.

I want to check for differences of a quantitative variable between 3 treatments, i.e perform an ANOVA.
Unfortunately the residuals of my model aren't normally distributed.

I usually have here two solutions : Transform my data or use a non parametric equivalent of my test (here a kruskal wallis rank test).

None of the transformations that i tried managed to satisfy normality (log, 1/x, square root, tukey and boxcox power) so I wanted to use a kruskal and to move on.

However, my project manager insisted on having only ANOVAs and talked about ANOVA on rank as a magic solution.

Working on R I looked for some examples and find a function art from ARTool package that perform anova on rank.

library(ARTool)
model <- art(variable~treatment,data)
anova(model)

Basically it takes your variable and replace it by its rank (dealing with ties by averaging the rank) as :

model2 <- lm(rank(variable, ties.method = "average")~treatment,data)
anova(model2)

gives exactly the same output.

I'm not an expert statistician and I wonder how valid is this solution/transformation. It seems quite brutal to me and not this far from the logic of the kruskal-wallis test even tho the statistic is not computed directly on ranks.

I find this very confusing to have an 'ANOVA on ranks' test that is different from the kruskal-wallis (also known as One-way ANOVA on ranks) and I don't know how to chose between those two tests.

I don't know if I've been very clear and if someone can help me but, anyway, Thanks for your attention and comments!

PS: here is an exemple on dummy data

library(ARTool)

# note that dummy data are random so we shouldn't have the same results
treatment <- as.factor(c(rep("A",100),rep("B",100),rep("C",100)))
variable <- as.numeric(c(sample(c(0:30),100,replace=T),sample(c(10:40),100,replace=T),sample(c(5:35),100,replace=T)))
dummy <- data.frame(treatment,variable)

model <- art(variable~treatment)
anova(model) #f.value = 30.746 and p = 7.312e-13

model2 <- lm(rank(variable, ties.method = "average")~treatment,dummy)
anova(model2) #f.value = 30.746 and p = 7.312e-13

kruskal.test(variable~treatment,dummy)


Skadoosh
  • 1
  • 1
  • This should probably be posted on CrossValidated since it is not a programming question. You can use ranks with ANOVA, but the p-values are computed assuming you are using normally-distributed, continuous values so you cannot trust them. – dcarlson Apr 29 '20 at 22:13
  • the residuals should be normally distributed because you are using a F test to assess them. https://stats.stackexchange.com/questions/6350/anova-assumption-normality-normal-distribution-of-residuals – StupidWolf Apr 29 '20 at 22:22
  • if you do rank, you can see that the residuals will be weird. So don't do the lm + anova – StupidWolf Apr 29 '20 at 22:23

0 Answers0