1

I am trying to create a new variable in a data.table. It is intended to take a variable in the data.table and for each observation compare that variable to a vector and return the index of the first observation in the vector that is greater than the variable in the data.table.

Example

ComparatorVector <- c(seq(1000, 200000, 1000))
Variable <- runif(10, min = 1000, max = 200000)

For each observation in Variable I'd like to know the index of the first observation in ComparatorVector that is larger than the observation of Variable.

I've played araound with min(which()), but couldn't get it to just go through the ComparatorVector. I also saw the match() function, but didn't find how to get it to return anything but the index of the exact match.

JotHa
  • 55
  • 7

1 Answers1

2

An option is findInterval

findInterval(Variable, ComparatorVector) +1
#[1] 190 152  99 107  38 148 114  95  53  73

Or with sapply

sapply(Variable, function(x) which(ComparatorVector > x)[1])
#[1] 190 152  99 107  38 148 114  95  53  73
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you very much for your help! Any suggestions on how to handle variables that are above the last variable in the ComparatorVector? – JotHa Feb 18 '21 at 17:51
  • @user9504090 i didn't get your question. I was trying to answer the question based on the description in the post. – akrun Feb 18 '21 at 17:58
  • Yup. You're right. What you have provided should work. My data is just not limited to the highest value ComparatorVector and I was wondering how to deal with observations that are above that value, but could then just redefine the NA outputs. – JotHa Feb 18 '21 at 18:07
  • @user9504090 you could `replace` with `NA` as I am still not clear about the logic you wanted to implement – akrun Feb 18 '21 at 18:08
  • My data (what I called `Variable` above) actually contains observations that are greater than 200000 (the upper limit I defined in `ComparatorVector` above). I was just wondering how to deal with the results of your approach for these cases. However, as you said and I also realized: The solution should just be to replace the results of your approach that are NA. – JotHa Feb 18 '21 at 18:11
  • @user9504090 For easiness, I would use a small example for crosschecking – akrun Feb 18 '21 at 18:12