0

I have this column in a df:

Column1
very sunny day today
it was sunny
not very sunny today

desired output

Column1
sunny
sunny
not sunny
df<-df%>%
  mutate(column_1=case_when(
    str_detect(column_1,"very sunny")~ "sunny",
    str_detect(column_1,"sunny")~ "sunny",
    str_detect(column_1,"not"&"sunny")~ " not sunny",
  )
         )

The code works well for the two first rows where we have more simple conditions for the 3rd row the conditions are more complicated and gives me an error.

I want to identify some keywords in that string which they are not together (very sunny) but they are separate (not very sunny today) and put them as conditions to give the desired output. Maybe I am doing something wrong with syntax.

user438383
  • 5,716
  • 8
  • 28
  • 43
pipts
  • 87
  • 7

3 Answers3

0

Would it be && instead of & in the 3rd row

  • thanks for the comment but i still get an error: x invalid 'x type in 'x && y' – pipts Jul 02 '21 at 15:18
  • I think if you split the third statement into two it might work...but as the second statement will return true, the case_when will complete before it gets there – Tech Commodities Jul 02 '21 at 15:46
  • In your simple case, you only need to identify the word "not", in which case, instead of the `case_when()`, use an `ifelse()` statement `df %>% mutate(column_1 = ifelse(str_detect(column_1, "not") == TRUE, "not sunny", "sunny"))` Though, this becomes complicated if there are more negative words. – Tech Commodities Jul 02 '21 at 15:54
  • thank you for the comment and the help. I have more rows in my actual dataset, for example, the value 'not very rainy today and the desired output I would like to be 'not rainy', so in this case, it's more complicated because there are two keywords. The code I posted workers fine except the last line. The operator '&' results in an error and I don't know why. @TechCommodities – pipts Jul 03 '21 at 01:57
0
df$column2 <- sub('(not )?.*(sunny).*', '\\1\\2', df$Column1)
df
               Column1   column2
1 very sunny day today     sunny
2         it was sunny     sunny
3 not very sunny today not sunny
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Mr. Onyambu, I didn't know if we don't specify a back reference like `\\2` all matches for the second captured group will be replaced by "". It's very interesting. – Anoushiravan R Jul 03 '21 at 15:02
0

Try this.

df <- data.frame(column_1 = c("very sunny today", "sunny today", "not very sunny today", "very very sunny today", "sunny today, not", "not sunny today", "no sun today"))

df%>%
  mutate(column_1 = case_when(
    (str_detect(column_1,"not") & str_detect(column_1,"sunny")) ~ "not sunny",
    str_detect(column_1,"very sunny")~ "sunny",
    str_detect(column_1,"sunny")~ "sunny",
    TRUE ~ "Unspecified"
  )
)

It copes with the not coming before or after the sunny, and separated by zero or more words. I added a few more examples to your test dataframe. It's good practice to include the TRUE ~ "" statement to case_when(), unless you're sure all possible inputs will be captured.

Tech Commodities
  • 1,884
  • 6
  • 13