I'm looking for something similar in form to weighted.mean()
. I've found some solutions via search that write out the entire function but would appreciate something a bit more user friendly.

- 211,554
- 25
- 370
- 453

- 1,125
- 2
- 9
- 13
9 Answers
The following packages all have a function to calculate a weighted median: 'aroma.light', 'isotone', 'limma', 'cwhmisc', 'ergm', 'laeken', 'matrixStats, 'PSCBS', and 'bigvis' (on github).
To find them I used the invaluable findFn() in the 'sos' package which is an extension for R's inbuilt help.
findFn('weighted median')
Or,
???'weighted median'
as ??? is a shortcut in the same way ?some.function
is for help(some.function)

- 118,240
- 47
- 247
- 360

- 7,226
- 3
- 31
- 23
-
3I didn't know about findFn! That's awesome! – Bob Albright May 01 '10 at 05:39
-
Agreed on the findFn. Very useful. And rather than install a new package I just got some sleep. I'm just trying to calculate the median of weighted data and did this: median(rep(x, each=w)). – Michael Williams May 01 '10 at 14:09
-
Yep, median(rep(x, each=w)), would work too. But only if all your weights are integers. – wkmor1 May 01 '10 at 23:35
-
11Hmisc also has wtd.quantile :) – Anthony Damico Mar 03 '13 at 09:55
-
That's going to be `median(rep(x,times=w))` if you want to give a vector with weights. The argument `each` only takes a single value. --- edit: Just saw the answer of @Jaitropmange. As (s)he said. – Joris Meys Oct 15 '13 at 21:27
Some experience using the answers from @wkmor1 and @Jaitropmange.
I've checked 3 functions from 3 packages, isotone
, laeken
, and matrixStats
. Only matrixStats
works properly. Other two (just as the median(rep(x, times=w)
solution) give integer output. As long as I calculated median age of populations, decimal places matter.
Reproducible example. Calculation of the median age of a population
df <- data.frame(age = 0:100,
pop = spline(c(4,7,9,8,7,6,4,3,2,1),n = 101)$y)
library(isotone)
library(laeken)
library(matrixStats)
isotone::weighted.median(df$age,df$pop)
# [1] 36
laeken::weightedMedian(df$age,df$pop)
# [1] 36
matrixStats::weightedMedian(df$age,df$pop)
# [1] 36.164
median(rep(df$age, times=df$pop))
# [1] 35
Summary
matrixStats::weightedMedian()
is the reliable solution

- 33,841
- 14
- 113
- 198

- 2,941
- 1
- 25
- 43
-
4Note that the rep(x, times=w) method requires integer weights, so it does not apply for your case. You could approximate using: median(rep(df$age, times=1000*df$pop)), which gives 36. Whether you want a decimal output depends on your definition of median. – Jaitropmange Oct 04 '15 at 15:28
-
2with one little detail... Answer 36 is correct according to definition of [weighted median on wikipedia](https://en.wikipedia.org/wiki/Weighted_median). – Kamil S Jaron Dec 07 '16 at 09:58
-
4
-
7I do not understand the sarcasm here. Is it not quite devastating to have many functions for the same thing, but with different results? How do you know that `matrixStats::weightedMedian()` gives the _reliable solution_? [The code](https://github.com/HenrikBengtsson/matrixStats/blob/master/src/weightedMedian_lowlevel_template.h) seems to suggest it should produce the [weighted percentile method result](https://en.wikipedia.org/wiki/Percentile#Weighted_percentile), but the values are different from `spatstat::weighted.median()` which uses this method and yields `35.66291` to the problem above. – 0range Oct 22 '18 at 23:49
-
3Of course, instead of choosing one way to compute the weighted median, we can also use the weighted median over all of them... – 0range Oct 23 '18 at 00:14
To calculate the weighted median of a vector x
using a same length vector of (integer) weights w
:
median(rep(x, times=w))

- 8,073
- 1
- 52
- 58

- 392
- 4
- 13
-
4This only works with integer weights. Weights in survey data are typically decimals. – Westcroft_to_Apse Oct 24 '17 at 17:31
-
2Very bad idea. What happens if `w` has very large values? Filling memory just to compute a median is unwise. – Feb 18 '22 at 18:15
This is just a simple solution, ready to use almost anywhere.
weighted.median <- function(x, w) {
w <- w[order(x)]
x <- x[order(x)]
prob <- cumsum(w)/sum(w)
ps <- which(abs(prob - .5) == min(abs(prob - .5)))
return(x[ps])
}

- 718
- 5
- 10
Really old post but I just came across it and did some testing of the different methods. spatstat::weighted.median()
seemed to be about 14 times faster than median(rep(x, times=w))
and its actually noticeable if you want to run the function more than a couple times. Testing was with a relatively large survey, about 15,000 people.

- 83
- 2
- 4
-
This function is now in the `{spatstat.geom}` package : `spatstat.geom::weighted.median()` – mdag02 Aug 30 '21 at 06:55
One can also use stats::density
to create a weighted PDF, then convert this to a CDF, as elaborated here:
my_wtd_q = function(x, w, prob, n = 4096)
with(density(x, weights = w/sum(w), n = n),
x[which.max(cumsum(y*(x[2L] - x[1L])) >= prob)])
Then my_wtd_q(x, w, .5)
will be the weighted median.
One could also be more careful to ensure that the total area under the density
is one by re-normalizing.

- 33,841
- 14
- 113
- 198
A way in base to get a weighted median will be to order by the values and build the cumsum
of the weights and get the value(s) at sum * 0.5
of the weights.
medianWeighted <- function(x, w, q=.5) {
n <- length(x)
i <- order(x)
w <- cumsum(w[i])
p <- w[n] * q
j <- findInterval(p, w)
Vectorize(function(p,j) if(w[n] <= 0) NA else
if(j < 1) x[i[1]] else
if(j == n) x[i[n]] else
if(w[j] == p) (x[i[j]] + x[i[j+1]]) / 2 else
x[i[j+1]])(p,j)
}
What will have the following results with simple input data.
medianWeighted(c(10, 40), c(1, 2))
#[1] 40
median(rep(c(10, 40), c(1, 2)))
#[1] 40
medianWeighted(c(10, 40), c(2, 1))
#[1] 10
median(rep(c(10, 40), c(2, 1)))
#[1] 10
medianWeighted(c(10, 40), c(1.5, 2))
#[1] 40
medianWeighted(c(10, 40), c(3, 4))
#[1] 40
median(rep(c(10, 40), c(3, 4)))
#[1] 40
medianWeighted(c(10, 40), c(1.5, 1.5))
#[1] 25
medianWeighted(c(10, 40), c(3, 3))
#[1] 25
median(rep(c(10, 40), c(3, 3)))
#[1] 25
medianWeighted(c(10, 40), c(0, 1))
#[1] 40
medianWeighted(c(10, 40), c(1, 0))
#[1] 10
medianWeighted(c(10, 40), c(0, 0))
#[1] NA
It can also be used for other qantiles
medianWeighted(1:10, 10:1, seq(0, 1, 0.25))
[1] 1 2 4 6 10
Compare with other methods.
#Functions from other Answers
weighted.median <- function(x, w) {
w <- w[order(x)]
x <- x[order(x)]
prob <- cumsum(w)/sum(w)
ps <- which(abs(prob - .5) == min(abs(prob - .5)))
return(x[ps])
}
my_wtd_q = function(x, w, prob, n = 4096)
with(density(x, weights = w/sum(w), n = n),
x[which.max(cumsum(y*(x[2L] - x[1L])) >= prob)])
weighted.quantile <- function(x, w, probs = seq(0, 1, 0.25),
na.rm = FALSE, names = TRUE) {
if (any(probs > 1) | any(probs < 0)) stop("'probs' outside [0,1]")
if (length(w) == 1) w <- rep(w, length(x))
if (length(w) != length(x)) stop("w must have length 1 or be as long as x")
if (isTRUE(na.rm)) {
w <- x[!is.na(x)]
x <- x[!is.na(x)]
}
w <- w[order(x)] / sum(w)
x <- x[order(x)]
cum_w <- cumsum(w) - w * (1 - (seq_along(w) - 1) / (length(w) - 1))
res <- approx(x = cum_w, y = x, xout = probs)$y
if (isTRUE(names)) {
res <- setNames(res, paste0(format(100 * probs, digits = 7), "%"))
}
res
}
Methods
M <- alist(
medRep = median(rep(DF$x, DF$w)),
isotone = isotone::weighted.median(DF$x, DF$w),
laeken = laeken::weightedMedian(DF$x, DF$w),
spatstat1 = spatstat.geom::weighted.median(DF$x, DF$w, type=1),
spatstat2 = spatstat.geom::weighted.median(DF$x, DF$w, type=2),
spatstat4 = spatstat.geom::weighted.median(DF$x, DF$w, type=4),
survey = survey::svyquantile(~x, survey::svydesign(id=~1, weights=~w, data=DF), 0.5)$x[1],
RAndres = weighted.median(DF$x, DF$w),
matrixStats = matrixStats::weightedMedian(DF$x, DF$w),
MichaelChirico = my_wtd_q(DF$x, DF$w, .5),
Leonardo = weighted.quantile(DF$x, DF$w, .5),
GKi = medianWeighted(DF$x, DF$w)
)
Results
DF <- data.frame(x=c(10, 40), w=c(1, 2))
sapply(M, eval)
# medRep isotone laeken spatstat1 spatstat2
# 40.00000 40.00000 40.00000 40.00000 25.00000
# spatstat4 survey RAndres matrixStats MichaelChirico
# 17.50000 40.00000 10.00000 30.00000 34.15005
# Leonardo.50% GKi
# 25.00000 40.00000
DF <- data.frame(x=c(10, 40), w=c(1, 1))
sapply(M, eval)
# medRep isotone laeken spatstat1 spatstat2
# 25.00000 25.00000 40.00000 10.00000 10.00000
# spatstat4 survey RAndres matrixStats MichaelChirico
# 10.00000 10.00000 10.00000 25.00000 25.05044
# Leonardo.50% GKi
# 25.00000 25.00000
In those two cases only isotone and GKi give identical results compared to what median(rep(x, w))
returns.

- 37,245
- 2
- 26
- 48
If you're working with the survey
package, assuming you've defined your survey design and x
is your variable of interest:
svyquantile(~x, mydesign, c(0.5))

- 1,544
- 1
- 16
- 27
I got here looking for weighted quantiles, so I thought I might as well leave for future readers what I ended up with. Naturally, using probs = 0.5 will return the weighted median.
I started with MichaelChirico's answer, which unfortunately was off at the edges. Then I decided to switch from density()
to approx()
. Finally, I believe I nailed the correction factor to ensure consistency with the default algorithm of the unweighted quantile()
.
weighted.quantile <- function(x, w, probs = seq(0, 1, 0.25),
na.rm = FALSE, names = TRUE) {
if (any(probs > 1) | any(probs < 0)) stop("'probs' outside [0,1]")
if (length(w) == 1) w <- rep(w, length(x))
if (length(w) != length(x)) stop("w must have length 1 or be as long as x")
if (isTRUE(na.rm)) {
w <- w[!is.na(w)]
x <- x[!is.na(x)]
}
w <- w[order(x)] / sum(w)
x <- x[order(x)]
cum_w <- cumsum(w) - w * (1 - (seq_along(w) - 1) / (length(w) - 1))
res <- approx(x = cum_w, y = x, xout = probs)$y
if (isTRUE(names)) {
res <- setNames(res, paste0(format(100 * probs, digits = 7), "%"))
}
res
}
When weights are uniform, the weighted quantiles are identical to regular unweighted quantiles:
x <- rnorm(100)
stopifnot(stopifnot(identical(weighted.quantile(x, w = 1), quantile(x)))
Example using the same data as in the weighted.mean()
man page.
x <- c(3.7, 3.3, 3.5, 2.8)
w <- c(5, 5, 4, 1)/15
stopifnot(isTRUE(all.equal(
weighted.quantile(x, w, 0:4/4, names = FALSE),
c(2.8, 3.33611111111111, 3.46111111111111, 3.58157894736842,
3.7)
)))
And this is for whoever solely wants the weighted median value:
weighted.median <- function(x, w, ...) {
weighted.quantile(x, w, probs = 0.5, names = FALSE, ...)
}

- 108
- 8
-
1Note there's a bug in this code: `w <- x[!is.na(x)]` in the `na.rm` if block is a typo and should be `w <- w[!is.na(w)]` – mal May 23 '23 at 02:32