I am tasked with determining the percentile rank of variables in a sample dataset, compared to a reference dataset.
In R version 3.6.1, I've found ecdf()
does what I want it to do, but I'm running into a problem where percentiles of minimum values of the dataset do not = 0.
x <- seq(0.5, 10, length.out = 100) #create numeric variable
summary(x) # minimum value is 0.5
Fn_x <- ecdf(x) # create ecdf function based on numeric variable (reference dataset)
Fn_x(c(0, 0.5, 1, 5, 10)) #calculate percentile rank of sample dataset
# [1] 0.00 0.01 0.06 0.47 1.00
percentile rank of minimum value (0.5) is 0.01
but this is the minimum value of the dataset, should return 0.0
I've found that the dplyr::percent_rank()
function returns the correct answer (i.e. percentile rank of 0.5 is 0). However, I cannot figure out how to use dplyr::percent_rank in a similar manner, where I calculate percentile of sample dataset compared to reference dataset.
Please provide a solution to either 1) use ecdf() to return percentile ranks of 0 for minimum values in the dataset, or 2) use dplyr::percent_rank() in a similar manner.