2

I'm using the rstatix library in R 3.6.3 through RStudio 1.2.5042 and am getting the impossible p-value of 1 when running a two-sample Wilcoxon aka Mann-Whitney U Test.

My first instinct says this is a floating point precision issue and the actual value is something like 0.999999 but I came here to confirm before an undisclosed federal research agency gets up my ass about it.

Here's my code:

wilcox_test(data,
            DV ~ Group,
            paired = F,
            exact = T, 
            alternative = "two.sided",
            conf.level = 0.95,
            detailed = T)

Link to .csv formatted data

Data has been anonymized of course. This link expires in 1 week.

Cross-posting for consistency:

https://stats.stackexchange.com/questions/467572/p-value-of-1-for-mann-whitney-u-artifact-of-r

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
myfatson
  • 354
  • 2
  • 14
  • P-value = 1 must mean the test statistic is the least possibly unusual or most possibly typical value. What is that "most boring" value for the test statistic? Can you verify that the test statistic has been computed correctly for your data? (dput(wilcox_test(...)) should show all the details.) Also, your data file is pretty small; instead of posting an ephemeral link, maybe just paste the data into your problem statement. Finally, this is more off topic for stats.stackexchange.com since it's more conceptual. PS. tnx for the laugh. – Robert Dodier May 20 '20 at 17:27
  • Thanks @RobertDodier will do! – myfatson May 20 '20 at 17:37
  • 1
    Appreciate the cross-post link, but please avoid doing that: it wastes time and effort. Instead, guess the best venue and go with it. If you get no interest in a reasonable amount of time, or a consensus that you guessed wrong, then you can re-post/cross-post to the other place. – Ben Bolker May 20 '20 at 20:45

2 Answers2

3

A little bit of digging finds that rstatix::wilcox_test() is suppressing warnings about ties being incompatible with exactness in its implementation. If you run plain old stats::wilcox.test() (which rstatix is calling eventually anyway), this happens:

w <- wilcox.test(DV~Group,data=dat)

Warning message: In wilcox.test.default(x = c(5L, 0L, 0L, 3L, 0L, 1L, 3L, 4L, 0L, : cannot compute exact p-value with ties

You can see here that rstatix is suppressing this warning.

Just to double-check, I worked through the guts of wilcox.test: the formula for the Z-statistic for the approximate test is

STATISTIC-n.x*n.y/2-CORRECTION

(see Wikipedia: doesn't mention the continuity correction though).

In this case the W-statistic is 209.5, n.x*n.y/2 is 209, and the continuity correction is 0.5 - so you do get a Z-statistic of exactly zero, so that pnorm(z) is 0.5 and the two-tailed test statistic is exactly 1.

If you instead want to deal with the ties exactly:

coin::wilcox_test(DV~factor(Group), data=dat, distribution="exact")
##  Exact Wilcoxon-Mann-Whitney Test
## data:  DV by factor(Group) (Control, Treatment)
## Z = 0.013327, p-value = 0.9954
## alternative hypothesis: true mu is not equal to 0
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • This is a great explanation! So is the moral here that when you have ties you should use the coin::wilcox_test instead? – myfatson May 21 '20 at 20:14
  • 1
    if you don't want to make asymptotic approximations, yes. (I claim that here the difference between p=0.9954 and p=1 is unlikely to be *practically* important ...). The other moral is that suppressing warning messages is dangerous. – Ben Bolker May 21 '20 at 20:16
2

P-values of 1 are not impossible, as described here and here. Also note that in your case the exact p-value cannot be computed because you have ties. The function you use does not give this information but the wilcox.test() function from the stats package gives a warning.

wilcox.test(test_data$DV ~ test_data$Group)

#>Warning message:
#>In wilcox.test.default(x = c(5, 0, 0, 3, 0, 1, 3, 4, 0, 3, 2, 1,  :
#>  cannot compute exact p-value with ties

Ahorn
  • 3,686
  • 1
  • 10
  • 17
  • I also saw those posts but they did not clear up my question. The first link implicates bonferonni correction but i didn't do any p-value adjustments because there weren't multiple comparisons. As well, my data is not discrete, it's continuous. They also mention that I might need a "two.sided" test which I was already using. So that post further suggests that this is a precision issue. I set exact p-value to FALSE and there was no difference. – myfatson May 20 '20 at 17:37
  • Ignore the part where I say this is continuous data, it is indeed ordinal data! – myfatson May 20 '20 at 17:42
  • 1
    sorry, didn't notice that the first part of my answer overlaps exactly with yours (although I did provide a little more detail ...) – Ben Bolker May 20 '20 at 20:48
  • No worries, your answer is indeed more detailed contains additional insights. – Ahorn May 20 '20 at 21:26