Look up the approximate ranks of numbers not found in a vector

Question

R> x=c(92, 3, 1, 4, 15, 4)
R> rank(x)
[1] 6.0 2.0 1.0 3.5 5.0 3.5

rank() can give the ranks of elements in a vector. I want to find the approximate ranks of numbers that may not be in the vector.

For example, the ordered elements are

R> sort(unique(x))
[1]  1  3  4 15 92

The weights are c(1, 1, 2, 1, 1), with the ranks of c(1, 2, 3.5, 5, 6).

If I look up the approximate rank of a number not in the vector, for example, 3.5, I will have the approximate rank as 2*1/3+3.5*2/3=3. 2 is the rank of 3 (the greatest number no greater than 3.5), 3.5 is the rank of 4 (the least number no less than 3.5). 1/3 is the normalized weight of 3, 2/3 is the normalized weight of 4.

When the look-up number is less than the least number in the vector (1 in the example), the approximate rank will be 1. When the lookup number is greater than the greatest number in the vector (92), the rank will be the length of the vector (6 in the example).

A simple piece of the code is below. But it is not efficient if I want to look up many numbers for their approximate rank. It is also not robust against cases like when the lookup number is found in the vector, or the lookup number is outside the range of the vector.

w = sapply(split(rank(x), x), length)
r = sapply(split(rank(x), x), head, n=1)
d = data.frame(x=as.numeric(names(w)), w=w, r=r)
i1 = with(d, max(seq_along(x)[x<=3.5]))
i2 = with(d, min(seq_along(x)[x>=3.5]))
w1 = (d[i2, 'x']-3.5)*d[i1, 'w']
w2 = (3.5-d[i1, 'x'])*d[i2, 'w']
(w1*d[i1, 'r']+w2*d[i2, 'r'])/(w1+w2)

Is there a robust and efficient solution to this problem?

Edit

approxfun() seems to be relevant (for interpolation). But I don't find it can give some nodes more weights. Some linear interpolation with weight may be helpful.

You could make a clever function that takes `x` as the input and returns a function that calculates approximate ranks. Depending on how efficient you need to be, using `data.table` or `Rcpp` could speed things up. — Gregor Thomas, Feb 10 '22 at 22:04
What do you have? Do you have the ranks and the values and weights or do you just have `x`?? If so, why not just `rank(c(new_numbers, x))[seq_along(new_numbers)]`? — Onyambu, Feb 10 '22 at 22:06
`rank(c(3.5, x))[1]`; `rank(c(-3.5, x))[1]`; `rank(c(103.5, x))[1]` — Onyambu, Feb 10 '22 at 22:10
I want the approx rank to be between 1 and the length of the vector. — user1424739, Feb 10 '22 at 23:51
Given that this function may have been considered by others, is there an implementation already somewhere? @GregorThomas — user1424739, Feb 10 '22 at 23:54
If I knew of a function that does this I would point you to it. — Gregor Thomas, Feb 10 '22 at 23:55

Look up the approximate ranks of numbers not found in a vector

Edit

0 Answers0