0

I would like to perform non-parametric testing for a dataframe. I have three groups A,B,C. I´d like to now the statistical significance between groups A/B, B/C and A/C. How can I do that non-parametrically? When applying Kruskal-Wallis-Test, I get the overall-inference between groups. This serves as protection for the following post-hoc test. But how to program the non-parametric post-hoc test (either using Kruskal-Wallis or Mann-Whitney-U)?

x<-c(1,2,3,4,5,6,7,8,9,NA,9,8)
y<-c(2,3,NA,3,4,NA,2,3,NA,2,3,4)
group<-rep((factor(LETTERS[1:3])),4)
df<-data.frame(x,y,group)
df
Doc
  • 358
  • 1
  • 4
  • 24
  • 1
    i'm flagging for migration as this seems to be a more *which statistical test* question, rather than *how to implement in R* – user1317221_G Jan 06 '13 at 12:20
  • No! Actually it is not! The statistical tests are quite clear. This is an programming issue. So far my research did not yield any results for multiple comparison in R using NON-PARAMETRIC tests at all. – Doc Jan 06 '13 at 12:22
  • try then to provide in your question which statistical tests you would like to perform then, so other users can quickly show you how to do the test you want. – user1317221_G Jan 06 '13 at 12:23
  • 1
    you might want to see http://stackoverflow.com/q/2478272/1317221 – user1317221_G Jan 06 '13 at 12:34
  • 1
    and possibly http://stats.stackexchange.com/a/20133 – user1317221_G Jan 06 '13 at 12:40

1 Answers1

2

ok, just to summarize the discussion within the comments above, there are several (not so well known) possibilities around to perform multiple non-parametric comparison with R-project. I included two of them for the example above:

library(pgrimess)
library(nparcomp)

x<-c(1,2,3,4,5,6,7,8,9,NA,8,9)
y<-c(2,3,NA,3,4,NA,2,3,NA,2,3,4)
group<-rep((factor(LETTERS[1:3])),4)
df<-data.frame(x,y,group)


kruskal.test(df$x~df$group)
kruskalmc(df$x~df$group)

m<-nparcomp(x ~ group, data=df, asy.method = "probit", type = "Dunnett", control = "A", alternative = "two.sided", info = FALSE)
summary(m) 

nparcomp is obviously more flexible and allows a large variety of contrasts. Here I picked Dunnett as an example.

There is a proposed procedure for multiple testing, bit according to several posts, there appeared some accuracy problems in large datasets. https://stat.ethz.ch/pipermail/r-help/2012-January/300100.html

NDWD <- oneway_test(price ~ clarity, data = diamonds,
        ytrafo = function(data) trafo(data, numeric_trafo = rank),
        xtrafo = function(data) trafo(data, factor_trafo = function(x)
            model.matrix(~x - 1) %*% t(contrMat(table(x), "Tukey"))),
        teststat = "max", distribution = approximate(B=1000))

    ### global p-value
    print(pvalue(NDWD))

    ### sites (I = II) != (III = IV) at alpha = 0.01 (page 244)
    print(pvalue(NDWD, method = "single-step"))

Another possibility would be rms::polr followed by rms::contrasts as suggested by Frank Harrell https://stat.ethz.ch/pipermail/r-help/2012-January/300329.html

Finally, user1317221_G included some very useful links including a boxplot incorporating the results of the test https://stats.stackexchange.com/a/20133 and a more detailed description for advanced graphing of boxplots is found one link further at http://egret.psychol.cam.ac.uk/statistics/R/graphs2.html

Hopefully that solves a couple of problems in that sector.

Community
  • 1
  • 1
Doc
  • 358
  • 1
  • 4
  • 24