significant difference tests between independent groups in R

Question

x   6.18    3.76    5.15    4.02    2.52    1.41    3.36    8.67    9.36
y   9.39    13.50   10.80   12.70   14.70   13.40   10.10   4.12    10.30
z   6.35    3.90    5.32    5.08    8.38    5.84    3.96    3.78    
b   1.15    2.26    1.47    1.93    1.25    2.87    4.19    2.55

I want to compare the 4 groups x,y,z,b and get which group are significant different.

thanks!

You are getting negative votes because your question is not very clear as to its goals. If I have (or have not) correctly interpreted your intent, you should put in the appropriate clarifications. — IRTFM, Sep 05 '12 at 21:28
I would suggest to repost in http://stats.stackexchange.com/ — S4M, Sep 05 '12 at 21:29
if you do like @DWin's answer, you could reword your title (and question) to specify "*post hoc* tests of pairwise differences between groups" (rather than "how to use Kruskal-Wallis" (which you've already demonstrated you can do) and "the significant values in the raw data" (which is hard to interpret in any sensible way) — Ben Bolker, Sep 05 '12 at 21:36
@S4M. You're not the first person to suggest that: http://stackoverflow.com/questions/12287924/compare-more-than-two-samples#comment16484249_12287924 — GSee, Sep 05 '12 at 21:39
What do you mean by "figure out the different values between these four groups"? Figure out the values that are significantly unlikely to be from other groups, but *are* within range of one group? Or what? — David Robinson, Sep 05 '12 at 21:58

score 4 · Answer 1 · answered Sep 05 '12 at 21:11

4

Kruskal-Wallis is a nonparametric test that compares multiple groups to see if one is significantly greater than the others. It doesn't decide whether specific values in any of the groups are significant.

answered Sep 05 '12 at 21:11

David Robinson

77,383
16
167
187

Thanks. But If I want to get some specific values, is there any test can do that? – user1586241 Sep 05 '12 at 21:14
can you explain what you mean by a single significant value? Are you looking for outliers? or ... ?? – Ben Bolker Sep 05 '12 at 21:37
Thanks, I changed my question here after discussion here. – user1586241 Sep 07 '12 at 00:11
@user1586241: Do you still want a *nonparametric* test of which groups are different? There's no indication that any of the groups you show are not normally distributed (you can do `shapiro.test(x)` to see) – David Robinson Sep 07 '12 at 02:17

IRTFM · Answer 2 · 2012-09-07T00:20:59.493

You might consider looking first at the means (after putting this data in a dataframe 'datm'):

> aggregate(datm$value, datm['variable'], mean, na.rm=TRUE)
  variable         x
1        x 0.9566667
2        y 1.4277778
3        z 2.3700000
4        b 0.0787500

Or at medians:

> aggregate(datm$value, datm['variable'], median, na.rm=TRUE)
  variable     x
1        x 0.750
2        y 1.710
3        z 2.265
4        b 0.010

In package coin there is a post-hoc test that is based on ranks (as kruskal.test is.) It is actually in the examples of the LocationTests help page and is reproduced without modification except for changing the names of the columns in the formula and the dataset name. There is no cited author for that page but the package authors are here: Torsten Hothorn, Kurt Hornik, Mark A. van de Wiel and Achim Zeileis:

 ### Nemenyi-Damico-Wolfe-Dunn test (joint ranking)
  ### Hollander & Wolfe (1999), page 244 
  ### (where Steel-Dwass results are given)
  if (require("multcomp")) {

    NDWD <- oneway_test(value~variable, data = datm,
        ytrafo = function(data) trafo(data, numeric_trafo = rank),
        xtrafo = function(data) trafo(data, factor_trafo = function(x)
            model.matrix(~x - 1) %*% t(contrMat(table(x), "Tukey"))),
        teststat = "max", distribution = approximate(B = 90000))

    ### global p-value
    print(pvalue(NDWD))

    ### DWin note: prints pairwise p-value for comparison of rankings
    print(pvalue(NDWD, method = "single-step"))
  }
#-----------------------
[1] 0
99 percent confidence interval:
 0.000000e+00 5.886846e-05 


y - x 0.8287000
z - x 0.1039889
b - x 0.1107667
z - y 0.5421778
b - y 0.0053000
b - z 0.0000000

To answer the question in the comment, this is what I did:

 dat <- read.table(text="x   y   z   b
 2.06    1.71    2.47    0.00
 1.08    2.73    1.75    0.00
 1.94    2.29    2.44    0.01
 1.32    1.71    2.50    0.01
 0.75    2.40    4.17    0.01
 0.18    0.45    2.09    0.20
 0.72    0.58    1.77    0.30
 0.22    0.35    1.77    0.10
 0.34    0.63  NA NA", header=TRUE)
 require(reshape2)
#Loading required package: reshape2
 datm <- melt(dat)  # then proceeded as above.

Hi DWin, I think you get my idea. I just want to ask a tech question. How can I change my data (I corrected in my question) to a dataframe then can be used in your method aggregate. Thanks! — user1586241, Sep 07 '12 at 00:10

significant difference tests between independent groups in R

2 Answers2