2

I have the following 2 data tables:

DT1 <- data.table(A = c(100,50,10), B = c("Good","Ok","Bad"))
DT1
     A    B
1: 100 Good
2:  50   Ok
3:  10  Bad

and

DT2 <- data.table(A = c(99,34,5,"",24,86))
DT2
    A
1: 99
2: 34
3:  5
4:   
5: 24
6: 86   

What I would like to return when joining DT1 and DT2 is

DT2
    A       B
1: 99    Good
2: 34    Ok
3:  5    Bad
4:       NA
5: 24    Ok
6: 86    Good

The "roll" option in data.table is only for "nearest" match so it doesnt work in my case. Is there any way I can do such lookup with data.table?

Uwe
  • 41,420
  • 11
  • 90
  • 134
southwind
  • 636
  • 4
  • 15
  • 1
    The value 24 is closer to `Bad` than to `Ok`, at least as measured by absolute distance. Then why did you choose `Ok` as the matched value? – Tim Biegeleisen Jul 19 '18 at 01:51
  • https://stackoverflow.com/questions/16095680/merge-dataframes-on-matching-a-b-and-closest-c – BENY Jul 19 '18 at 01:52

2 Answers2

3

The rolling join does work for me if rolled backwards (NOCB = next obervation carried backwards):

library(data.table)
DT1 <- data.table(A = c(100, 50, 10), B = c("Good", "Ok", "Bad"))
DT2 <- data.table(A = c(99, 34, 5, "", 24, 86))

DT2[, A := as.numeric(A)]
DT1[DT2, on = "A", roll = -Inf]
    A    B
1: 99 Good
2: 34   Ok
3:  5  Bad
4: NA <NA>
5: 24   Ok
6: 86 Good

Note that this does only work if both columns A are numeric (or integer). By using "", the OP has turned DT2$A into a character column.

Uwe
  • 41,420
  • 11
  • 90
  • 134
  • 2
    OP writes "DT2" above their desired output, so could do `DT2[, B := DT1[DT2, on = "A", roll = -Inf, x.B]]` – Frank Jul 19 '18 at 12:21
  • That is brilliant. That exactly the answer I was looking for. Now I learnt that you can set "roll" to be "backwards" – southwind Jul 20 '18 at 00:35
1

Here is a base R approach

df1 <- as.data.frame(DT1)
df2 <- as.data.frame(DT2)

df2$B <- apply(df2, 1, function(x) {
    if(x != "") df1$B[which.min(abs(as.numeric(x) - df1$A))] else NA
})
df2
#    A    B
# 1 99 Good
# 2 34   Ok
# 3  0  Bad
# 4    <NA>
# 5 24  Bad
# 6 86 Good

Or the same using data.tables

DT2[, B := apply(DT2, 1, function(x) 
    if(x != "") DT1$B[which.min(abs(as.numeric(x) - DT1$A))] else NA)]
DT2
#    A    B
#1: 99 Good
#2: 34   Ok
#3:  0  Bad
#4:      NA
#5: 24  Bad
#6: 86 Good

We match based on the smallest absolute difference between DT1$A and DT2$A values.


Sample data

DT1 <- data.table(A = c(100,50,0), B = c("Good","Ok","Bad"))
DT2 <- data.table(A = c(99,34,0,"",24,86))
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68