3

This is probably a simple question for those experienced in R, but it is something that I (a novice) am struggling with...

I have two examples of vectors that are common to the problem I am trying to solve, A and B:

A <- c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10)
B <- c(1,3,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,9)

#and three scalars
R <- 4
t <- 5
N <- 3

There is a fourth scalar, n, where 0<=n<=N. In general, N <= R.

I want to find the n closest non-NA values to t such that they fall within a radius R centered on t. I.e., the search radius, R comprises of R+1 values. For example A, the search radius sequence is (3,NA,3,NA,4,NA,1), where t=NA, the middle value in the search radius sequence.

The expected answer can be one of two results for A:

answerA1 <- c(3,4,1)

OR

answerA2 <- c(3,4,3)

The expected answer for B:

answerB <- c(1,3)

How would I accomplish this task in the most time- and space-efficient manner? One liners, loops, etc. are welcome. If I have to choose a preference, it is for speed!

Thanks in advance!

Note:

For this case, I understand that the third closest non-NA value may involve choosing a preference for the third value to fall on either the right or left of t (as shown by the two possible answers above). I do not have a preference for whether this values falls to the left or the right of t but, if there is a way to leave it to random chance, (whether the third value falls to the right or the left) that would be ideal (but, again, it is not a requirement).

thatWaterGuy
  • 315
  • 3
  • 12
  • Not clear what `c(3,4,1)` is. Is it `c(R,t,n)`? If so, why is `c(3,4,3)` as good as `c(3,4,1)`? – Pierre Lapointe Jun 19 '17 at 20:04
  • Clarified by providing another example and expected answer. I am looking for the `n` closest non-NA values to index `t`. In general, `0<=n<=N`. For example A, both c(3,4,3) and c(3,4,1) are equally as good answers because they are both the three closest non-NA values within the search radius, `R` centered on `t`. – thatWaterGuy Jun 19 '17 at 20:09

3 Answers3

2

A relatively short solution is:

orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]
n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
res <- na.omit(orderedA)[seq_len(n_obj)]

res
#[1] 3 4 3

Breaking this down a little more the steps are:

  1. Order A, by the absolute distance from the position of interest, t.

    • Code is: A[order(abs(seq_len(length(A)) - t))]
  2. Subset to the first R*2 elements (so this will get the elements on either side of t within R.

    • Code is: [seq_len(R*2)]
  3. Get the first min(N, # of non-NA, len of non-NA) elements
    • Code is: min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))
  4. Drop NA
    • Code is: na.omit()
  5. Take first elements determined in step 3 (whichever is smaller)
    • Code is: [seq_len(n_obj)]
Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • Thanks @Mike H.! What if we wanted to generalize this such that `0<=n<=N`? Where N is given as the argument (i.e., instead of a fixed `n`)? In general, `N<=R`. – thatWaterGuy Jun 19 '17 at 20:18
  • @thatWaterGuy, Can you elaborate a little more on what you mean? Do you want to pass `N` as an arugment, and then `n` is determined randomly such that `0<=n<=N`? – Mike H. Jun 19 '17 at 20:21
  • `n` is determined by the number of non-NA elements within the search radius `R` centered on `t`, limited by an upper value of `N`, hence `0<=n<=N`. See example B in my edited question. I want to return at most `N` non-`NA` values within the search radius, but sometimes there are less than `N` non-`NA` values, this is what `n` represents. – thatWaterGuy Jun 19 '17 at 20:29
  • @thatWaterGuy, See my update - is that what you want? The last part was changed to select the first `min(N, # of NA)` elements – Mike H. Jun 19 '17 at 20:33
  • thanks for your time. For... `> B <- c(1,3,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,9) > orderedB <- B[order(abs(seq_len(length(B)) - 5))][seq_len(4*2)] > ansB <- na.omit(orderedB)[seq_len(min(sum(is.na(orderedB)), 3))]` It returns: `# [1] 3 1 NA` It should be: `[1] 3 1` – thatWaterGuy Jun 19 '17 at 20:47
  • Ah yes, good point. I'll update - should just need to add another condition to the `min()`. @thatWaterGuy see my edits - this should give you what you want now for `A` and `B`. At this point, I'd recommend breaking out the code since there are a lot of nested functions – Mike H. Jun 19 '17 at 20:47
  • that's it! Much appreciated! – thatWaterGuy Jun 19 '17 at 21:01
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/147111/discussion-between-thatwaterguy-and-mike-h). – thatWaterGuy Jun 19 '17 at 21:35
1

Something like this?

thingfinder <- function(A,R,t,n) {
  left <- A[t:(t-R-1)]
  right <- A[t:(t+R+1)]
  leftrightmat <- cbind(left,right)
  raw_ans <- as.vector(t(leftrightmat))
  ans <- raw_ans[!is.na(raw_ans)]
  return(ans[1:n])
}

thingfinder(A=c(1,3,NA,3,NA,4,NA,1,7,NA,2,NA,9,9,10), R=3, t=5, n=3)
##  [1] 3 4 3

This would give priority to the left side, of course.

Matt Tyers
  • 2,125
  • 1
  • 14
  • 23
0

In case it is helpful to others, @Mike H. also provided me with a solution to return the index positions associated with the desired vector elements res:

A <- setNames(A, seq_len(length(A)))

orderedA <- A[order(abs(seq_len(length(A)) - t))][seq_len(R*2)]

n_obj <- min(sum(is.na(orderedA)), N, length(na.omit(orderedA)))

res <- na.omit(orderedA)[seq_len(n_obj)]

positions <- as.numeric(names(res))

thatWaterGuy
  • 315
  • 3
  • 12