i'm not sure that this is the perfect place for such a question but maybe you can help me.
I want to check for differences of a quantitative variable between 3 treatments, i.e perform an ANOVA.
Unfortunately the residuals of my model aren't normally distributed.
I usually have here two solutions : Transform my data or use a non parametric equivalent of my test (here a kruskal wallis rank test).
None of the transformations that i tried managed to satisfy normality (log, 1/x, square root, tukey and boxcox power) so I wanted to use a kruskal and to move on.
However, my project manager insisted on having only ANOVAs and talked about ANOVA on rank as a magic solution.
Working on R I looked for some examples and find a function art
from ARTool
package that perform anova on rank.
library(ARTool)
model <- art(variable~treatment,data)
anova(model)
Basically it takes your variable and replace it by its rank (dealing with ties by averaging the rank) as :
model2 <- lm(rank(variable, ties.method = "average")~treatment,data)
anova(model2)
gives exactly the same output.
I'm not an expert statistician and I wonder how valid is this solution/transformation. It seems quite brutal to me and not this far from the logic of the kruskal-wallis test even tho the statistic is not computed directly on ranks.
I find this very confusing to have an 'ANOVA on ranks' test that is different from the kruskal-wallis (also known as One-way ANOVA on ranks) and I don't know how to chose between those two tests.
I don't know if I've been very clear and if someone can help me but, anyway, Thanks for your attention and comments!
PS: here is an exemple on dummy data
library(ARTool)
# note that dummy data are random so we shouldn't have the same results
treatment <- as.factor(c(rep("A",100),rep("B",100),rep("C",100)))
variable <- as.numeric(c(sample(c(0:30),100,replace=T),sample(c(10:40),100,replace=T),sample(c(5:35),100,replace=T)))
dummy <- data.frame(treatment,variable)
model <- art(variable~treatment)
anova(model) #f.value = 30.746 and p = 7.312e-13
model2 <- lm(rank(variable, ties.method = "average")~treatment,dummy)
anova(model2) #f.value = 30.746 and p = 7.312e-13
kruskal.test(variable~treatment,dummy)