2

I have a working filter statement in dplyr that I can't translate to base R

library(dplyr)
x <- data.frame(
    v1 = c("USA", "Canada", "Mexico"),
    v2 = c(NA, 1:5)
  )

x %>% filter(v1=="Canada",v2 %in% 3:5)

x[x$v1=="Canada" && x$v2 %in% 3:5,]

Any help would be appreciated.

Carl
  • 5,569
  • 6
  • 39
  • 74

2 Answers2

2

To illustrate:

library(dplyr)
x <- data.frame(
   v1 = c("USA", "Canada", "Mexico"),
   v2 = c(NA, 1:5)
)

# filter 
x %>% filter(v1=="Canada",v2 %in% 3:5)
      v1 v2
1 Canada  4

# your approach
x[x$v1=="Canada" && x$v2 %in% 3:5,]
 v1 v2
<0 rows> (or 0-length row.names)

# second & removed
x[x$v1=="Canada" & x$v2 %in% 3:5,]
      v1 v2
5 Canada  4

Apart from the rowname, it gives the same result.

Look at this example to understand what was happening before (taken from here)

-2:2 >= 0
[1] FALSE FALSE  TRUE  TRUE  TRUE
-2:2 >= 0 & -2:2 <= 0
[1] FALSE FALSE  TRUE FALSE FALSE
-2:2 >= 0 && -2:2 <= 0
[1] FALSE

In some situations, you may encounter issues with NAs. Then it is advisable to wrap logical statements into which. filter filters out NAs by default. E.g.

# will include NA:
x[x$v2 > 3,]
       v1 v2
NA   <NA> NA
5  Canada  4
6  Mexico  5

# will exclude NA 
x[which(x$v2 > 3),]
      v1 v2
5 Canada  4
6 Mexico  5
Community
  • 1
  • 1
coffeinjunky
  • 11,254
  • 39
  • 57
  • Man, I thought `&&` was a short-circuted "and" meaning essentially `x && y` means if x is false don't even bother evaluating y. Similar `x || y` means if x is true don't even both evaluating y, but I was very wrong apparently. Thanks – Carl May 31 '16 at 19:06
  • Well, it is! But you don't want that here. You want a vector of `TRUE` and `FALSE` statements, one for each observation. If you only supply one single `FALSE` statement when `R` expects a vector, `R` will recycle it. Therefore, for each and every row you say `FALSE`, and thus you get the above outcome (no rows). – coffeinjunky May 31 '16 at 19:15
1

subset is in base R, and functions similarly to filter in dplyr. Is subset sufficient for you, or do you need the bracket notations for some reason?

> x <- data.frame(
+     v1 = c("USA", "Canada", "Mexico"),
+     v2 = c(NA, 1:5)
+ )

Via dplyr:

> x %>% filter(v1=="Canada",v2 %in% 3:5)
      v1 v2
1 Canada  4

Via base R/subset:

> subset(x, v1 == 'Canada' & v2 %in% 3:5)
      v1 v2
5 Canada  4
Tyler Byers
  • 131
  • 5