I am looking for a way to create a new data.table
column within a piped sequence using grepl
looking for any occurrence of a particular string.
I have looked here and here for help, and there seems to be many questions around this topic but it doesn't seem to directly address my issue.
Also, I may be misunderstanding the data.table
syntax and am referencing the Reference semantics vignettes. I have the code below with two approaches that could be piped/chained but don't seem to work. The last option where the data.table
column is created explicitly seems to work but I am wondering if it can be chained/piped.
To my understanding, using lapply
within a data.table
will apply a function to the entire column (ie sum
, mean
, na.approx
which I found out from another posted question) but will not work on a row-wise basis. Also, I can apply a function to each row in a given column using the new_col := function(x)
. So I would have thought one of those to work.
I am (only somewhat) aware that the grepl
is expecting a single value but a vector is being supplied and I am unsure of how to fix that.
Any help is appreciated, thanks.
> library(data.table)
>
> a = c("housefly",
+ "house fly",
+ "HOUSEFLY",
+ "HOUSE FLY")
>
> dt = data.table(insect = c("housefly",
+ "house fly",
+ "HOUSEFLY",
+ "HOUSE FLY",
+ "dragonfly",
+ "dragon fly"))
>
> # does not work but I could put this in chain/pipe
> dt[, fly_check := sapply(.SD, grepl, paste(a, collapse = "|")), .SDcols = "insect"]
Warning message:
In FUN(X[[i]], ...) :
argument 'pattern' has length > 1 and only the first element will be used
> dt
insect fly_check
1: housefly TRUE
2: house fly TRUE
3: HOUSEFLY TRUE
4: HOUSE FLY TRUE
5: dragonfly TRUE
6: dragon fly TRUE
>
> # does not work but I could put this in chain/pipe
> dt[, fly_check := ifelse(grepl(insect, paste(a, collapse = "|")), TRUE, FALSE)]
Warning message:
In grepl(insect, paste(a, collapse = "|")) :
argument 'pattern' has length > 1 and only the first element will be used
> dt
insect fly_check
1: housefly TRUE
2: house fly TRUE
3: HOUSEFLY TRUE
4: HOUSE FLY TRUE
5: dragonfly TRUE
6: dragon fly TRUE
>
> # works but can't be chained/piped
> dt$fly_check = sapply(dt$insect, grepl, pattern = paste(a, collapse = "|"))
> dt
insect fly_check
1: housefly TRUE
2: house fly TRUE
3: HOUSEFLY TRUE
4: HOUSE FLY TRUE
5: dragonfly FALSE
6: dragon fly FALSE