Create new conditional column if string contains elements from list

Question

I'm trying to add a new column keywords that will get the value TRUE if the word occurs in a list of keywords. The value will be FALSE if the word doesn't occur in the keywordslist. My keywords consists of more than 100 words, so manually adding the words is not an option.

keywordlist(sample):

thank
impressed
this

I have a dataframe with the values id and word, I have unnested the words and grouped by id:

id      word
1234    thank
1234    you
1234    very
1234    much
1567    i
1567    am
1567    not
1567    impressed
9654    what
9654    is
9654    this

I would like the result to look like this:

id      word       keywords
1234    thank      TRUE
1234    you        FALSE
1234    very       FALSE
1234    much       FALSE
1567    i          FALSE
1567    am         FALSE
1567    not        FALSE
1567    impressed  TRUE
9654    what       FALSE
9654    is         FALSE
9654    this       TRUE

The codes that I have tried is as followed: 1. :

df <- df %>%
  group_by(id) %>%
  mutate(keywords = ifelse(
  word == rowwise(keywordslist), TRUE, FALSE)

code #1 raises the next error:

Error in mutate_impl(.data, dots) : Evaluation error: is.data.frame(data) is not TRUE.

I have tried a little different variant with grepl:

df <- df %>% group_by(id) %>% mutate(keywords = ifelse( word == rowwise(grepl(keywordslist, word)), TRUE,FALSE)

This raised the following error:

Error in mutate_impl(.data, dots) : Evaluation error: is.data.frame(data) is not TRUE. In addition: Warning message: In grepl(keywordslist, keywords) : argument 'pattern' has length > 1 and only the first element will be used

I'm not sure if this is the correct way to approach this situation anymore. Any help is welcome.

This might be helpful to understand. https://stackoverflow.com/questions/1169248/test-if-a-vector-contains-a-given-element — Ronak Shah, Jun 15 '18 at 08:00

score 3 · Answer 1 · edited Jun 15 '18 at 07:16

3

df$keywords <- df$word %in% keywordslist

should do it

edited Jun 15 '18 at 07:16

Rui Barradas

70,273
8
34
66

answered Jun 15 '18 at 07:15

lebatsnok

6,329
2
21
22

I didn't understand the role of grouping by `id` though - so that is not included in my answer. – lebatsnok Jun 15 '18 at 07:17
and I presumed that `keywordslist` is a character vector (and so i s `df$word`) - – lebatsnok Jun 15 '18 at 07:18
Thanks! it was much easier than I thought. – Dennis Loos Jun 15 '18 at 07:28

patL · Accepted Answer · 2018-06-15T07:35:07.240

You can do something like:

library(dplyr)

 df1 %>% 
  mutate(keywords = word %in% keywordlist)

#  id      word keywords
#1  1234     thank     TRUE
#2  1234       you    FALSE
#3  1234      very    FALSE
#4  1234      much    FALSE
#5  1567         i    FALSE
#6  1567        am    FALSE
#7  1567       not    FALSE
#8  1567 impressed     TRUE
#9  9654      what    FALSE
#10 9654        is    FALSE
#11 9654      this     TRUE

or with with base R

df1$keywords <- sapply(df1, function(x) x %in% keywordlist)[,2]


#   id      word keywords
#1  1234     thank     TRUE
#2  1234       you    FALSE
#3  1234      very    FALSE
#4  1234      much    FALSE
#5  1567         i    FALSE
#6  1567        am    FALSE
#7  1567       not    FALSE
#8  1567 impressed     TRUE
#9  9654      what    FALSE
#10 9654        is    FALSE
#11 9654      this     TRUE

Is there a way to use this ```base``` R function on a column with a specific name? (in case that column is in a different place in each dataset) — Nick Byrd, Jan 09 '23 at 17:30

score 0 · Answer 3 · answered Jun 15 '18 at 07:17

dplyr approach could be

library(dplyr)

df %>%
  mutate(keywords = grepl(paste(keywordlist, collapse = "|"), word))

which gives

     id      word keywords
1  1234     thank     TRUE
2  1234       you    FALSE
3  1234      very    FALSE
4  1234      much    FALSE
5  1567         i    FALSE
6  1567        am    FALSE
7  1567       not    FALSE
8  1567 impressed     TRUE
9  9654      what    FALSE
10 9654        is    FALSE
11 9654      this     TRUE

Sample data:

df <- structure(list(id = c(1234L, 1234L, 1234L, 1234L, 1567L, 1567L, 
1567L, 1567L, 9654L, 9654L, 9654L), word = c("thank", "you", 
"very", "much", "i", "am", "not", "impressed", "what", "is", 
"this")), .Names = c("id", "word"), class = "data.frame", row.names = c(NA, 
-11L))

keywordlist <- c("thank", "impressed", "this")

Create new conditional column if string contains elements from list

3 Answers3

Linked