When I load the data from your csv, each of the 5 groups have identical values for parameter and density:
df
#># A tibble: 3,500 x 3
#> hrzn parameter density
#> <int> <dbl> <dbl>
#> 1 1 0.0183 0.00914
#> 2 1 0.0185 0.00905
#> 3 1 0.0187 0.00897
#> 4 1 0.0189 0.00888
#> 5 1 0.0191 0.00880
#> 6 1 0.0193 0.00872
#> 7 1 0.0194 0.00864
#> 8 1 0.0196 0.00855
#> 9 1 0.0198 0.00847
#>10 1 0.0200 0.00839
#># ... with 3,490 more rows
sapply(1:5, function(x) all(df$parameter[df$hrzn == x] == df$parameter[df$hrzn == 1]))
# [1] TRUE TRUE TRUE TRUE TRUE
sapply(1:5, function(x) all(df$density[df$hrzn == x] == df$density[df$hrzn == 1]))
# [1] TRUE TRUE TRUE TRUE TRUE
I'm not sure if this is a mistake or not, but clearly if you're worried about computation, anything you want to do on all the groups can be done 5 times faster by only doing it on a single group.
Anyway, to get the 10th and 90th centiles for each hrzn, you just need to see which parameter is adjacent to 0.1 and 0.9 on the cumulative distribution function. Let's generalize to working it out for all the groups in case there's an issue with the data or you want to repeat it with different data:
library(dplyr)
df %>%
mutate(hrzn = factor(hrzn)) %>%
group_by(hrzn) %>%
summarise(centile_10 = parameter[which(cumsum(density) > .1)[1]],
centile_90 = parameter[which(cumsum(density) > .9)[1]] )
#># A tibble: 5 x 3
#> hrzn centile_10 centile_90
#> <fct> <dbl> <dbl>
#>1 1 0.0204 0.200
#>2 2 0.0204 0.200
#>3 3 0.0204 0.200
#>4 4 0.0204 0.200
#>5 5 0.0204 0.200
Of course, they're all the same for the reasons mentioned above.
If you're worried about computation time (even though the above only takes a few milliseconds), and you don't mind opaque code, you could take advantage of the ordering to cut
the cumsum
of your entire density
column between 0 and 5 in steps of 0.1, to get all the 10th centiles, like this:
summary <- df[which((diff(as.numeric(cut(cumsum(df$density), seq(0,5,.1))) - 1) != 0)) + 1,]
summary <- summary[-(1:5)*10,]
summary$centile <- rep(1:9*10, 5)
summary
#> # A tibble: 45 x 4
#> hrzn parameter density centile
#> <int> <dbl> <dbl> <dbl>
#> 1 1 0.0204 0.00824 10
#> 2 1 0.0233 0.00729 20
#> 3 1 0.0271 0.00634 30
#> 4 1 0.0321 0.00542 40
#> 5 1 0.0392 0.00453 50
#> 6 1 0.0498 0.00366 60
#> 7 1 0.0679 0.00281 70
#> 8 1 0.103 0.00199 80
#> 9 1 0.200 0.00114 90
#> 10 2 0.0204 0.00824 10
#> # ... with 35 more rows
Perhaps I have misunderstood you and you are actually working in a 5-dimensional parameter space and want to know the parameter values at the 10th and 90th centiles of 5d density. In that case, you can take advantage of the fact that all groups are the same to calculate the 10th and 90th centiles for the 5-d density by simply taking the 5th root of these two centiles:
df %>%
mutate(hrzn = factor(hrzn)) %>%
group_by(hrzn) %>%
summarise(centile_10 = parameter[which(cumsum(density) > .1^.2)[1]],
centile_90 = parameter[which(cumsum(density) > .9^.2)[1]] )
#> # A tibble: 5 x 3
#> hrzn centile_10 centile_90
#> <fct> <dbl> <dbl>
#> 1 1 0.0545 0.664
#> 2 2 0.0545 0.664
#> 3 3 0.0545 0.664
#> 4 4 0.0545 0.664
#> 5 5 0.0545 0.664