5

I am trying to create a variable that is a logical value when comparing one character string to more than two other character strings in a data.table and I need to ignore NA's.

Sample data for D2:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1")), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000015eb1261ef0>)

Attempted a proposed solution below:

D2[, Match := apply(sapply(.SD, `==`, D2[, "var1"]), 1, any), .SDcols = 
c("var2", "var3")]

Result for a003 is TRUE whereas it should be FALSE because var1 and var3 don't match:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1"), Match = c(TRUE, TRUE, TRUE)), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 
0x0000015eb1261ef0>)

Desired Result:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1"), Match = c(TRUE, TRUE, FALSE)), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 
0x0000015eb1261ef0>)
user3594490
  • 1,949
  • 2
  • 20
  • 26
  • Your desired result is identical to your sample data. Did you forget to include the proper expected output? – Maurits Evers Sep 17 '19 at 22:05
  • @Maurits Evers Never fails. I just edited the desired result. – user3594490 Sep 17 '19 at 22:26
  • Thanks for updating. Do you only want to compare `var1` with `var2`? What about `var3`? I'm still trying to understand the rules/logic behind your expected output. For row 3, is `Match = FALSE` because `var1 != var2`? Why is `Match = TRUE` for row 2? – Maurits Evers Sep 17 '19 at 22:55
  • @MauritsEvers Yes exactly, I have more than two columns that I need to match with var1. I can not find one example of using mapply with more than two variables. ID a003 is not a match because var1 does not match one of either var2 or var3. – user3594490 Sep 17 '19 at 23:08

4 Answers4

4

How about the following

setDT(D1)
D1[, Match := apply(sapply(.SD, `==`, D1[, "var1"]), 1, any), .SDcols = c("var2", "var3")]
D1
#ID  var1  var2  var3 Match
#1: a001 char1 char1 char1  TRUE
#2: a002 char1  <NA> char1  TRUE
#3: a003 char2 char1 char1 FALSE

Explanation: We compare entries in the sub-data.table defined through .SDcols with entries in D1[, "var1"]; if there is any match, return TRUE, else FALSE.


Update

In response to your comment, you can do

setDT(D1)
D1[, Match := apply(sapply(.SD, `==`, D1[, "var1"]), 1, all, na.rm = T), .SDcols = c("var2", "var3")]
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • If I change the value of var2 for ID a003 from char1 to char2 the result is TRUE whereas it should be FALSE because var3 is not a match with var1 even though var2 is. – user3594490 Sep 17 '19 at 23:41
  • @user3594490 That was not clear to me; I thought you wanted to have `TRUE` if at least one entry matched. I've made an edit, please take a look. – Maurits Evers Sep 18 '19 at 00:17
  • 1
    Yes! This works and my initial explanation needed to be more clear. – user3594490 Sep 18 '19 at 00:45
1

Another option is:

D2[, m := Reduce(`&`, data.table(
        sweep(as.matrix(.SD[, -"var1"]), 1L, .SD[["var1"]], function(x, y) is.na(x) | x==y))), 
    .SDcols=var1:var3]

Or using melt:

D2[, m := melt(D2, id.vars=c("ID", "var1"))[, any(var1==value, na.rm=TRUE), ID]$V1]

data:

library(data.table)
D2 <- structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
    "char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
        "char1", "char1")), row.names = c(NA, -3L), class = c("data.table", 
            "data.frame"))
setDT(D2)
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
1

We can do this with only apply by checking if the first value in the row is same as all the other elements in the row.

D2$match <- apply(D2[, -1], 1, function(x) all(x[1] == na.omit(x)))
D2
#     ID  var1  var2  var3 match
#1: a001 char1 char1 char1  TRUE
#2: a002 char1  <NA> char1  TRUE
#3: a003 char2 char2 char1 FALSE
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Should be as simple as defaulting NA values:

else.na <- function(x, y) ifelse(is.na(x), y, x)
dt[,Match:= mapply('%in%', var1, else.na(var2, var1)) ,by = ID]
Bulat
  • 6,869
  • 1
  • 29
  • 52
  • This works for var1 and var2 and produces desired output given my example. How can I apply it to var3 also? For example,only if the value for var1 matches the values for var2 and var3 would the result be TRUE. – user3594490 Sep 17 '19 at 23:19