-4
x   6.18    3.76    5.15    4.02    2.52    1.41    3.36    8.67    9.36
y   9.39    13.50   10.80   12.70   14.70   13.40   10.10   4.12    10.30
z   6.35    3.90    5.32    5.08    8.38    5.84    3.96    3.78    
b   1.15    2.26    1.47    1.93    1.25    2.87    4.19    2.55    

I want to compare the 4 groups x,y,z,b and get which group are significant different.

thanks!

user1586241
  • 171
  • 1
  • 1
  • 10
  • You are getting negative votes because your question is not very clear as to its goals. If I have (or have not) correctly interpreted your intent, you should put in the appropriate clarifications. – IRTFM Sep 05 '12 at 21:28
  • 1
    I would suggest to repost in http://stats.stackexchange.com/ – S4M Sep 05 '12 at 21:29
  • if you do like @DWin's answer, you could reword your title (and question) to specify "*post hoc* tests of pairwise differences between groups" (rather than "how to use Kruskal-Wallis" (which you've already demonstrated you can do) and "the significant values in the raw data" (which is hard to interpret in any sensible way) – Ben Bolker Sep 05 '12 at 21:36
  • @S4M. You're not the first person to suggest that: http://stackoverflow.com/questions/12287924/compare-more-than-two-samples#comment16484249_12287924 – GSee Sep 05 '12 at 21:39
  • What do you mean by "figure out the different values between these four groups"? Figure out the values that are significantly unlikely to be from other groups, but *are* within range of one group? Or what? – David Robinson Sep 05 '12 at 21:58

2 Answers2

4

Kruskal-Wallis is a nonparametric test that compares multiple groups to see if one is significantly greater than the others. It doesn't decide whether specific values in any of the groups are significant.

David Robinson
  • 77,383
  • 16
  • 167
  • 187
2

You might consider looking first at the means (after putting this data in a dataframe 'datm'):

> aggregate(datm$value, datm['variable'], mean, na.rm=TRUE)
  variable         x
1        x 0.9566667
2        y 1.4277778
3        z 2.3700000
4        b 0.0787500

Or at medians:

> aggregate(datm$value, datm['variable'], median, na.rm=TRUE)
  variable     x
1        x 0.750
2        y 1.710
3        z 2.265
4        b 0.010

In package coin there is a post-hoc test that is based on ranks (as kruskal.test is.) It is actually in the examples of the LocationTests help page and is reproduced without modification except for changing the names of the columns in the formula and the dataset name. There is no cited author for that page but the package authors are here: Torsten Hothorn, Kurt Hornik, Mark A. van de Wiel and Achim Zeileis:

 ### Nemenyi-Damico-Wolfe-Dunn test (joint ranking)
  ### Hollander & Wolfe (1999), page 244 
  ### (where Steel-Dwass results are given)
  if (require("multcomp")) {

    NDWD <- oneway_test(value~variable, data = datm,
        ytrafo = function(data) trafo(data, numeric_trafo = rank),
        xtrafo = function(data) trafo(data, factor_trafo = function(x)
            model.matrix(~x - 1) %*% t(contrMat(table(x), "Tukey"))),
        teststat = "max", distribution = approximate(B = 90000))

    ### global p-value
    print(pvalue(NDWD))

    ### DWin note: prints pairwise p-value for comparison of rankings
    print(pvalue(NDWD, method = "single-step"))
  }
#-----------------------
[1] 0
99 percent confidence interval:
 0.000000e+00 5.886846e-05 


y - x 0.8287000
z - x 0.1039889
b - x 0.1107667
z - y 0.5421778
b - y 0.0053000
b - z 0.0000000

To answer the question in the comment, this is what I did:

 dat <- read.table(text="x   y   z   b
 2.06    1.71    2.47    0.00
 1.08    2.73    1.75    0.00
 1.94    2.29    2.44    0.01
 1.32    1.71    2.50    0.01
 0.75    2.40    4.17    0.01
 0.18    0.45    2.09    0.20
 0.72    0.58    1.77    0.30
 0.22    0.35    1.77    0.10
 0.34    0.63  NA NA", header=TRUE)
 require(reshape2)
#Loading required package: reshape2
 datm <- melt(dat)  # then proceeded as above.
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks very much, I just try to understand it – user1586241 Sep 06 '12 at 22:57
  • Hi DWin, I think you get my idea. I just want to ask a tech question. How can I change my data (I corrected in my question) to a dataframe then can be used in your method aggregate. Thanks! – user1586241 Sep 07 '12 at 00:10