2

I have some values in df:

# A tibble: 7 × 1
   var1
  <dbl>
1     0
2    10
3    20
4   210
5   230
6   266
7   267

that I would like to compare to a second dataframe called value_lookup

# A tibble: 4 × 2
   var1 value
  <dbl> <dbl>
1     0     0
2   200    10
3   230    20
4   260    30

In particual I would like to make a join based on >= meaning that a value that is greater or equal to the number in var1 gets a values of x. E.g. take the number 210 of the orginal dataframe. Since it is >= 200 and <230 it would get a value of 10.

Here is the expected output:

  var1 value
1    0     0
2   10     0
3   20     0
4  210    10
5  230    20
6  266    30
7  267    30

I thought it should be doable using {fuzzyjoin} but I cannot get it done.

value_lookup <- tibble(var1 = c(0, 200,230,260),
                       value = c(0,10,20,30))

df <- tibble(var1 = c(0,10,20,210,230,266,267))

library(fuzzyjoin)
fuzzyjoin::fuzzy_left_join(
  x = df, 
  y = value_lookup ,
  by = "var1",
  match_fun = list(`>=`)
) 
Julian
  • 6,586
  • 2
  • 9
  • 33

2 Answers2

4

An option is also findInterval:

df$value <- value_lookup$value[findInterval(df$var1, value_lookup$var1)]

Output:

  var1 value
1    0     0
2   10     0
3   20     0
4  210    10
5  230    20
6  266    30
7  267    30

As you're mentioning joins, you could also do a rolling join via data.table with the argument roll = T which would look for same or closest value preceding var1 in your df:

library(data.table)

setDT(value_lookup)[setDT(df), on = 'var1', roll = T]
arg0naut91
  • 14,574
  • 2
  • 17
  • 38
  • Thank you for your nice answer. Does this work also in the other direction, i.e. for `<`? – Julian Nov 30 '22 at 09:01
  • 1
    I'm not aware of that option in `findInterval`, but you could do the reverse in `data.table`, e.g. `setDT(value_lookup)[setDT(df), on = 'var1', roll = -Inf]` – arg0naut91 Nov 30 '22 at 09:26
1

You can use cut:

df$value <- value_lookup$value[cut(df$var1, 
                                   c(value_lookup$var1, Inf), 
                                   right=F)]
# # A tibble: 7 x 2
#    var1 value
#   <dbl> <dbl>
# 1     0     0
# 2    10     0
# 3    20     0
# 4   210    10
# 5   230    20
# 6   266    30
# 7   267    30
Robert Hacken
  • 3,878
  • 1
  • 13
  • 15