dt1
has val
of interest in the correct sequence
library(data.table)
dt1 <- data.frame(id = 1,
key = c(paste0("a_",1:6),paste0("b_",1:6)),
val = c(122,128,134, rep(NA,3),c(110,112,114),rep(NA,3)),
var = c(rep("a",6),rep("b",6)))
id key val var
1 1 a_1 122 a
2 1 a_2 128 a
3 1 a_3 134 a
4 1 a_4 NA a
5 1 a_5 NA a
6 1 a_6 NA a
7 1 b_1 110 b
8 1 b_2 112 b
9 1 b_3 114 b
10 1 b_4 NA b
11 1 b_5 NA b
12 1 b_6 NA b
dt2
also has the val
of interest in the correct sequence but also some extra val
dt2 <- data.frame(id = 1,
key = c(paste0("a_",c(1,3:6)),paste0("b_",c(2,4:6))),
val = c(122,127,122,128,134,110,110,112,114),
var = c(rep("a",5),rep("b",4)))
id key val var
1 1 a_1 122 a
2 1 a_3 127 a
3 1 a_4 122 a
4 1 a_5 128 a
5 1 a_6 134 a
6 1 b_2 110 b
7 1 b_4 110 b
8 1 b_5 112 b
9 1 b_6 114 b
I want to match the sequence of values from dt2
to the sequence of values in dt1
and ignore extra values that are in dt2
.
I have tried a rolling join backwards as values of interest in dt2
are culstered towards the end of the sequence.
Current attempt:
setDT(dt1,key = c("id","var","val"))
setDT(dt2,key = c("id","var","val"))
dt1[dt2, roll = -Inf]
id key val var i.key
1: 1 a_1 122 a a_1 # wrong
2: 1 a_1 122 a a_4
3: 1 a_2 127 a a_3 # wrong
4: 1 a_2 128 a a_5
5: 1 a_3 134 a a_6
6: 1 b_1 110 b b_2 # wrong
7: 1 b_1 110 b b_4
8: 1 b_2 112 b b_5
9: 1 b_3 114 b b_6
It looks like duplicate values from dt2
that are in dt1
but not in the sequence I am looking for are causing an issue. Also I want to keep i.key
to know the original key as it will be used for other processing. I have also tried: merge(dt1,dt2)
Desired output:
id key val var i.key
1 a_1 122 a a_4
1 a_2 128 a a_5
1 a_3 134 a a_6
1 b_1 110 b b_4
1 b_2 112 b b_5
1 b_3 114 b b_6
I would be grateful for some guidance