2

The weighted.median() function in the spatstat package returns "10.5", when I pass the evenly weighted scores of 10, 11, & 12. I was expecting the response of "11" (which is the output of stats::median() and matrixStats::weightedMedian()).

The concept of a weighted median isn't very natural to me. Is the output incorrect, or am I misunderstand the purpose of the function?

x <- c(10, 11, 12)
w <- c( 1,  1,  1)

spatstat::weighted.median(x, w)
#> [1] 10.5
spatstat::weighted.quantile(x, w, probs = .5)
#>  50% 
#> 10.5


matrixStats::weightedMedian(x, w)
#> [1] 11
median(x)
#> [1] 11

Created on 2020-02-23 by the reprex package (v0.3.0)

wibeasley
  • 5,000
  • 3
  • 34
  • 62

2 Answers2

2

There is a more fundamental issue here about the definition of a quantile (including the median) in small finite samples.

The help file for the R base function quantile.default says that there is an argument type, with 7 different options, which will give different answers. These are carefully described in a fine article by Rob Hyndman, cited in the help file. The default for quantile.default is type=7.

The algorithm in spatstat::weighted.quantile performs the analogue of type=4 (according to its help file); that is, the cumulative distribution function F(x) is linearly interpolated and then the inverse function is computed. This algorithm is correctly implemented in the spatstat code.

The weighted median in the other package you mentioned is computing a different definition of the weighted median.

Thank you very much for drawing our attention to this example. This may prompt us to extend the implementation of spatstat::weighted.median to embrace the other types.

Incidentally, bug reports for a CRAN package should be posted on the package's bug reports page as shown on CRAN. It's just lucky that I saw this post. But thank you very much, both of you, for spotting this issue.

Adrian Baddeley
  • 2,534
  • 1
  • 5
  • 8
  • Thanks. I wasn't sure what was a problem with the software, and what was a problem with my brain. I'm sorry, I don't see "Hyndman" anywhere in the reference manual. Am I looking in the wrong help file? – wibeasley Feb 24 '20 at 04:46
  • 1
    Thanks Adrian. After reading the references you cite, I appreciate that the quantile algorithm is not wrong as such. I think though that if you apply the [Principle of least astonishment](https://en.wikipedia.org/wiki/Principle_of_least_astonishment), then the default behaviour should be that the median of 10, 11, 12 with equal weighting would be 11, – Allan Cameron Feb 24 '20 at 09:41
  • 1
    Just following up on the question by @wibeasley about the reference to Hyndman. It is in the help page for `quantile.default`, so you should be able to find it by typing `?quantile.default` in the R console. – Ege Rubak Mar 02 '20 at 19:19
1

I believe this is a flaw in the package, and I'll explain why.

Firstly, weighted.median actually just calls weighted.quantile with the probs vector set to 0.5. But if you call weighted.quantile with your data, you get very strange results:

weighted.quantile(x, w)
#>    0%   25%   50%   75%  100% 
#> 10.00 10.00 10.50 11.25 12.00 

That's not right.

If you look at the body of this function using body(weighted.quantile), and follow the logic through, there seems to be a problem with the way the weights are normalized on line 10 into a variable called Fx. To work properly, the normalized weights should be a vector of the same length as x, but starting at 0 and ending in 1, with the spacing in between being proportional to the weights.

But if you look at how this is actually calculated:

body(weighted.quantile)[[10]]
#> Fx <- cumsum(w)/sum(w)

You can see it doesn't start at 0. In your case, the first element would be 0.3333.

So to show this is the case, let's write over this line with the correct expression. (First we need to unlock the binding to give access to the function)

unlockBinding("weighted.quantile", asNamespace("spatstat"))
body(weighted.quantile)[[10]] <- substitute(Fx <- (cumsum(w) - min(w))/(sum(w) - min(w)))

Now we get the correct result for weighted quantiles (including the correct median)

weighted.quantile(x, w)
#>   0%  25%  50%  75% 100% 
#> 10.0 10.5 11.0 11.5 12.0 
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • I like the unlockBinding technique. I've referenced your response in a GitHub Issue 128: https://github.com/spatstat/spatstat/issues/128 – wibeasley Feb 24 '20 at 02:19