2

I have a datatable as below:

library(data.table)

dt <- data.table(
  id = c(1:3),
  string = list(c("tree", "house", "star"),  
                c("house", "tree", "dense forest"), 
                c("apple", "orange", "grapes"))
  )

From this I wanted to get the rows which contains "tree" in the list string column. So I tried

dt["tree" %in% string]
Empty data.table (0 rows) of 2 cols: id,string


dt["tree" %in% unlist(string)]
   id                  string
1:  1         tree,house,star
2:  2 house,tree,dense forest
3:  3     apple,orange,grapes

I am not sure which part I am doing it wrong.I just need id 1 and 2 to be returned.Any help is appreciated.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Ricky
  • 2,662
  • 5
  • 25
  • 57

3 Answers3

3

Or just

library(data.table)
dt[grep("\\btree\\b", string)]

   id                  string
1:  1         tree,house,star
2:  2 house,tree,dense forest

Looks like whats wrong with you approach is that %in% doesn't work on lists

"tree" %in% dt$string[1]
[1] FALSE

Whereas grep() or grepl() accepts everything it can coerce to a character vector

grepl("tree", dt$string[1])
[1] TRUE

as.character(dt$string[1])
[1] "c(\"tree\", \"house\", \"star\")"

Which means it would also match other words with tree inside IF as @RonakShah reminded me you don't use word boundaries \b.

Humpelstielzchen
  • 6,126
  • 3
  • 14
  • 34
  • 3
    This would also pick the rows with the word `"treed"` or `"atree"`, any word with "tree" inside. – zx8754 Oct 10 '19 at 07:55
2

Since string is a list you need sapply or some other way to iterate over each list.

library(data.table)
dt[sapply(string, function(x) any(x == "tree"))]

#   id                  string
#1:  1         tree,house,star
#2:  2 house,tree,dense forest
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can also use str_detect from stringr

library(dplyr)
library(stringr)
dt %>%
   filter(str_detect(string, "\\btree\\b"))
#   id                    string
#1  1         tree, house, star
#2  2 house, tree, dense forest

Or using Map in data.table

dt[unlist(Map(`%in%`, "tree", string))]
#   id                  string
#1:  1         tree,house,star
#2:  2 house,tree,dense forest
akrun
  • 874,273
  • 37
  • 540
  • 662