0

I have the following example table where I am required to find the median age of a herd of animals. Not only does it have a 0, it is also has a grouped frequency of animals for a given age.

library(tidyverse)
a<-data.frame(Age=c(0,1,2,3,4,5,6,7,8,9),
              Individuals=c(3655,2535,898,235,559,265,258,3659,7895,3655))
a%>%summarise(Age=as.numeric(Age),
          Median=sort(as.numeric(Age)*Persons/sum(Individuals)))

I understand that the standard median() option does not work. I tried to be clever and attempted something like: median(rep(a$Age, a$Individuals)), but the memory consumption was too much. Besides, I think it will fail with a larger dataset.

Miloop
  • 103
  • 2
  • There is a `weighted.median` function in [spatstat](https://cran.r-project.org/web/packages/spatstat/), [among others](https://stackoverflow.com/questions/2748725/is-there-a-weighted-median-function). –  Feb 18 '22 at 18:11

2 Answers2

1

You could be abit clever and do:

a %>%
  arrange(Age) %>%
  summarise(median = Age[findInterval(sum(Individuals)/2, cumsum(Individuals)) + 1])

  median
1      7
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

You can uncount the original data frame and then use the standard median function.

a %>% uncount(Individuals) %>% summarise(Median=median(Age))
  Median
1      7

And to check:

> sum(a$Individuals)/2
[1] 11807
> sum(a$Individuals[1:7])
[1] 8405
> sum(a$Individuals[1:8])
[1] 12064

All good.

Limey
  • 10,234
  • 2
  • 12
  • 32
  • 1
    OP just stated that the memory consumption for this is too much. Also wont work for large counts – Onyambu Feb 18 '22 at 17:51
  • In which case, my rough and ready check provides a memory minimal custom method for obtaining the median. – Limey Feb 18 '22 at 18:20
  • i do not think your check provide the median. You just used the value obtained from uncount to try and do a quick check. – Onyambu Feb 18 '22 at 18:22