0

How to obtain the exact p-value of a Kruskal-Wallis (e.g. with 3 groups) test in R?

Example of data:

df <- data.frame(
    dv = c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46,
           1.15, 0.88, 0.90, 0.74, 1.21),
    group = factor(rep(c("A", "B", "C"), c(5, 5, 5))))

I tried the coin package using the function kruskal_test

kruskal_test(dv ~ group, data = df,distribution= "exact")

Although an error is produced:

Error in .local(object, ...) : ‘object’ is not a two-sample problem

If I change the "exact" for "approximate" it runs, but it is not the exact distribution...

Any thoughts?

Sinval
  • 1,315
  • 1
  • 16
  • 25
  • perhaps you need a pair wise test – akrun May 22 '20 at 23:41
  • Try `combn(levels(df$group), 2, FUN = function(x) kruskal_test(dv ~ group, data = subset(df, group %in% x), distribution = 'exact'), simplify = FALSE)` – akrun May 22 '20 at 23:43
  • Have you tried `stats::kruskal.test(dv ~ group, data = df)`? – duckmayr May 22 '20 at 23:43
  • 1
    @duckmayr, yes sorry, you are correct, I somehow got confused for a moment between `kruskal.test` and `ks.test`. I too had success with your `kruskal.test` approach. – Ian Campbell May 22 '20 at 23:45
  • No worries @IanCampbell ! We all get turned around from time to time – duckmayr May 22 '20 at 23:45
  • Yes, it does the chi-square approximation. I saw in [this](https://doi.org/10.1080/00220973.2012.699904) 2013 that no package does it... but I would like to know if someone already implemented it (besides the SPSS's `exact` module... – Sinval May 22 '20 at 23:47
  • Ah, I understand now your issue @Sinval , my apologies. I do not know of a solution to your exact issue, sorry – duckmayr May 22 '20 at 23:47
  • @Sinval I looked around a bit and to my knowledge there isn't an implementation of an exact algorithm for > 2 groups in R yet. Algorithms for it are presented though, for example in https://www.tandfonline.com/doi/abs/10.1081/SAC-120023876 , so it might be possible to implement yourself if you really need it. – duckmayr May 23 '20 at 00:00
  • Thanks, it really seems that it is not yet implemented in R. :/ – Sinval May 23 '20 at 00:07

1 Answers1

1

The reason you're getting the error is because you can only exactly calculate the distribtution for a two-sample problem.

From help("kruskal_test"):

...the distribution can be approximated via Monte Carlo resampling or computed exactly for univariate two-sample problems by setting distribution to "approximate" or "exact" respectively.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • Well, if I set to `"approximate"` it does work. So that is only true for the `"exact"` option. – Sinval May 22 '20 at 23:49
  • @Sinval I found some papers talking about the SAS exact approach, but I couldn't find anything implemented in R. If this answer isn't helpful, I can delete it. – Ian Campbell May 22 '20 at 23:55
  • Thank you, I know that `IBM SPSS Statistics Exact Tests` module also manages to obtain those exact p-values. I really wanted to use R because of the `exams` package I' using with my students. I want them to check the exact p-values table, and then insert them in the Moodle exam platform. After the values will be compared with those produced with R... – Sinval May 22 '20 at 23:59