Minimum values in a dataset are not 0th percentile using ecdf function in R

Question

I am tasked with determining the percentile rank of variables in a sample dataset, compared to a reference dataset.

In R version 3.6.1, I've found ecdf() does what I want it to do, but I'm running into a problem where percentiles of minimum values of the dataset do not = 0.

x <- seq(0.5, 10, length.out = 100) #create numeric variable

summary(x) # minimum value is 0.5

Fn_x <- ecdf(x) # create ecdf function based on numeric variable (reference dataset)

Fn_x(c(0, 0.5, 1, 5, 10)) #calculate percentile rank of sample dataset


# [1] 0.00 0.01 0.06 0.47 1.00

percentile rank of minimum value (0.5) is 0.01

but this is the minimum value of the dataset, should return 0.0

I've found that the dplyr::percent_rank() function returns the correct answer (i.e. percentile rank of 0.5 is 0). However, I cannot figure out how to use dplyr::percent_rank in a similar manner, where I calculate percentile of sample dataset compared to reference dataset.

Please provide a solution to either 1) use ecdf() to return percentile ranks of 0 for minimum values in the dataset, or 2) use dplyr::percent_rank() in a similar manner.

The p-th percentile typically is the value x such that P(X <= x) = p. If you put your minimum value in there P(X <= 0.5) why should that give 0 when it's perfectly possible to achieve that particular value? — Dason, Aug 28 '19 at 14:31
How exactly are you defining the "percentile rank". The ecdf give the number of values less than or equal a certain value. What is the mathematical definition of the value you want to calculate? — MrFlick, Aug 28 '19 at 14:33
I'm defining percentile rank as 'the relative rank of a value in a data set as a percentage representing the number of values less than or equal to the value'. I'm looking for an analog to the =PERCENTRANK.INC function in excel (INC = inclusive of first and last values in an array). — fwEco, Aug 28 '19 at 14:59

Minimum values in a dataset are not 0th percentile using ecdf function in R

0 Answers0