R> x=c(92, 3, 1, 4, 15, 4)
R> rank(x)
[1] 6.0 2.0 1.0 3.5 5.0 3.5
rank()
can give the ranks of elements in a vector. I want to find the approximate ranks of numbers that may not be in the vector.
For example, the ordered elements are
R> sort(unique(x))
[1] 1 3 4 15 92
The weights are c(1, 1, 2, 1, 1)
, with the ranks of c(1, 2, 3.5, 5, 6)
.
If I look up the approximate rank of a number not in the vector, for example, 3.5, I will have the approximate rank as 2*1/3+3.5*2/3=3
. 2
is the rank of 3
(the greatest number no greater than 3.5
), 3.5
is the rank of 4
(the least number no less than 3.5
). 1/3
is the normalized weight of 3
, 2/3
is the normalized weight of 4
.
When the look-up number is less than the least number in the vector (1 in the example), the approximate rank will be 1. When the lookup number is greater than the greatest number in the vector (92), the rank will be the length of the vector (6 in the example).
A simple piece of the code is below. But it is not efficient if I want to look up many numbers for their approximate rank. It is also not robust against cases like when the lookup number is found in the vector, or the lookup number is outside the range of the vector.
w = sapply(split(rank(x), x), length)
r = sapply(split(rank(x), x), head, n=1)
d = data.frame(x=as.numeric(names(w)), w=w, r=r)
i1 = with(d, max(seq_along(x)[x<=3.5]))
i2 = with(d, min(seq_along(x)[x>=3.5]))
w1 = (d[i2, 'x']-3.5)*d[i1, 'w']
w2 = (3.5-d[i1, 'x'])*d[i2, 'w']
(w1*d[i1, 'r']+w2*d[i2, 'r'])/(w1+w2)
Is there a robust and efficient solution to this problem?
Edit
approxfun()
seems to be relevant (for interpolation). But I don't find it can give some nodes more weights. Some linear interpolation with weight may be helpful.