1

I would like to make a new variable unsure which contains the word "unsure" if any of the following words are found in the freetext column: "too soon", "to tell", leaving the freetext unchanged, and NA in the new column when freetext doesn't contain those words. Currently the data looks like:

   id               freetext date
1   1           its too soon    1
2   2           I'm not sure    2
3   3                   pink   12
4   4                 yellow   15
5   5       too soon to tell   20
6   6 I think it is too soon    2
7   7                 5 days    6
8   8                    red    7
9   9        its been 2 days    3
10 10       too soon to tell   11

The data:

structure(list(id = c("1","2","3","4","5","6","7","8","9","10"), 
            freetext = c("its too soon", "I'm not sure",
"pink","yellow","too soon to tell","I think it is too soon","5 days","red",
"its been 2 days","too soon to tell","scans","went on holiday"), 
date = c("1","2","12","15","20","2","6","7","3","11")), class = "data.frame", row.names = c(NA,-10L))

And I would like it to look like:

    id               freetext unsure date
1   1           its too soon unsure    1
2   2           I'm not sure   <NA>    2
3   3                   pink   <NA>   12
4   4                 yellow   <NA>   15
5   5       too soon to tell unsure   20
6   6 I think it is too soon unsure    2
7   7                 5 days   <NA>    6
8   8                    red   <NA>    7
9   9        its been 2 days   <NA>    3
10 10       too soon to tell unsure   11
oguz ismail
  • 1
  • 16
  • 47
  • 69
Gabriella
  • 421
  • 3
  • 11

1 Answers1

0

You can use if_else with str_detect for pattern matching -

library(tidyverse)
df %>% mutate(unsure = if_else(str_detect(freetext, 'too soon|to tell'), 'unsure', NA_character_))

#   id               freetext date unsure
#1   1           its too soon    1 unsure
#2   2           I'm not sure    2   <NA>
#3   3                   pink   12   <NA>
#4   4                 yellow   15   <NA>
#5   5       too soon to tell   20 unsure
#6   6 I think it is too soon    2 unsure
#7   7                 5 days    6   <NA>
#8   8                    red    7   <NA>
#9   9        its been 2 days    3   <NA>
#10 10       too soon to tell   11 unsure

In base R -

transform(df, unsure = ifelse(grepl('too soon|to tell', freetext), 'unsure', NA))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213