80

I am quite new to R.

Using the table called SE_CSVLinelist_clean, I want to extract the rows where the Variable called where_case_travelled_1 DOES NOT contain the strings "Outside Canada" OR "Outside province/territory of residence but within Canada". Then create a new table called SE_CSVLinelist_filtered.

SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean, 
where_case_travelled_1 %in% -c('Outside Canada','Outside province/territory of residence but within Canada'))

The code above works when I just use "c" and not "-c".
So, how do I specify the above when I really want to exclude rows that contains that outside of the country or province?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
ayk
  • 869
  • 2
  • 7
  • 6
  • 31
    If you find yourself wanting to use "does not contain" often, you might want to define your own function. For example `\`%notin%\` = function(x,y) !(x %in% y)`. Then you can do `x %notin% y` instead of `!(x %in% y)`. – eipi10 Dec 23 '15 at 22:09

4 Answers4

120

Note that %in% returns a logical vector of TRUE and FALSE. To negate it, you can use ! in front of the logical statement:

SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean, 
 !where_case_travelled_1 %in% 
   c('Outside Canada','Outside province/territory of residence but within Canada'))

Regarding your original approach with -c(...), - is a unary operator that "performs arithmetic on numeric or complex vectors (or objects which can be coerced to them)" (from help("-")). Since you are dealing with a character vector that cannot be coerced to numeric or complex, you cannot use -.

talat
  • 68,970
  • 21
  • 126
  • 157
fishtank
  • 3,718
  • 1
  • 14
  • 16
6

Try putting the search condition in a bracket, as shown below. This returns the result of the conditional query inside the bracket. Then test its result to determine if it is negative (i.e. it does not belong to any of the options in the vector), by setting it to FALSE.

SE_CSVLinelist_filtered <- filter(SE_CSVLinelist_clean, 
(where_case_travelled_1 %in% c('Outside Canada','Outside province/territory of residence but within Canada')) == FALSE)
BWO
  • 95
  • 2
  • 6
4

Just be careful with the previous solutions since they require to type out EXACTLY the string you are trying to detect.

Ask yourself if the word "Outside", for example, is sufficient. If so, then:

data_filtered <- data %>% 
  filter(!str_detect(where_case_travelled_1, "Outside")

A reprex version:

iris

iris %>% 
  filter(!str_detect(Species, "versicolor"))
gradcylinder
  • 370
  • 2
  • 6
  • Technically a stringr function, not dplyr. But yes part of tidyverse. And a good solution at that. – Vance Lopez Dec 02 '21 at 16:20
  • 1
    You could also use str_detect(. . ., negate = FALSE) instead of the outer negation – polmonroig Mar 14 '22 at 16:17
  • @polmonroig That's neat, I didn't know that! I guess they both read similarly. "filter iris by Species so that no strings are detected with "versicolor"" "filter iris by Species so that the "versicolor" string is not detected" – gradcylinder Mar 15 '22 at 19:03
3

Quick fix. First define the opposite of %in%:

  '%ni%' <- Negate("%in%")

Then apply:

SE_CSVLinelist_filtered <- filter(
    SE_CSVLinelist_clean, 
    where_case_travelled_1 %ni% c('Outside Canada',
      'Outside province/territory of residence but within Canada'))
ToWii
  • 590
  • 5
  • 8