1

Here is my data:

df <- tibble::tribble(
  ~A,  ~B,  ~C,  ~D,
  2L, "a", "e", 2L,
  4L, "a", "f", NA_integer_,
  4L, "b", "g", NA_integer_,
  4L, "b", "h", NA_integer_
  )

df$B <- as.factor(df$B) 
df$A <- as.factor(as.character(df$A)) 

Here is my filter condition as a character:

remove2 <- "as.integer(A)!=2L"

I just want remove observations with A==2, but instead the following code keeps it, why?

df %>% dplyr::filter_(remove2)

I want to use filter_ as it accepts the condition as a character. If you can suggest filter (without underscore version) and take character as a condition, that will also work.

Geet
  • 2,515
  • 2
  • 19
  • 42
  • 4
    The problem is the as.integer part of your filter. If you do as.integer(df$A) the return is 1, 2, 2, 2. Not 2, 4, 4, 4 as you expect. See [this SO post](https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information) – phiver May 19 '18 at 10:30
  • I guess the other answers comments make the point indirectly - I would maybe try to avoid factors in a first place. Why not simply keeping the variable as characters/ integers? You can always factorise ad hoc for whatever you need – tjebo May 19 '18 at 14:22
  • 1
    Out of curiosity, why do you want the condition as a character? Are you intending to link this to some user input, or something like that? – camille May 19 '18 at 16:42
  • @phiver I did not know that! Good to know. Thanks! – Geet May 20 '18 at 07:21
  • @camille I thought filter_ would accept quoted expression and solve the problem. – Geet May 20 '18 at 07:21

3 Answers3

3

Try the following:

remove2 <- "as.numeric(as.character(A))!=2L"

df %>% dplyr::filter_(remove2)

# A tibble: 3 x 4
  A     B     C         D
  <fct> <fct> <chr> <int>
1 4     a     f        NA
2 4     b     g        NA
3 4     b     h        NA

Note that factors are encoded differently. See

 as.integer(df$A)
 [1] 1 2 2 2

To get the values of the factors "as shown", use as.numeric(as.character(.))

Other answers have pointed out that the underscore-functions have deprecated (though they still work). To achieve this in an absolutely future-proof way, it might be a good idea to use simple base R:

df[which(df[["A"]] != 2L),]
# A tibble: 3 x 4
  A     B     C         D
  <fct> <fct> <chr> <int>
1 4     a     f        NA
2 4     b     g        NA
3 4     b     h        NA
coffeinjunky
  • 11,254
  • 39
  • 57
  • As @Aurèle points out below, the underscore versions of `dplyr` functions have been deprecated in favor of NSE/`rlang`-based operations – camille May 19 '18 at 16:40
  • @coffeinjunky The base solution is interesting! Good to know that. – Geet May 20 '18 at 07:26
3

Others have explained the cause of this issue, which is factor internally is coded as integer, which could be different than what it looks like apparently. The other thing I want to point out is filter_ have been deprecated since dplyr 0.7. So we can consider evaluate the string as the following two options with the filter function.

remove2 <- "as.integer(as.character(A)) != 2L"

library(dplyr)
library(rlang)

df %>% filter(eval(parse(text = remove2)))
# # A tibble: 3 x 4
#   A     B     C         D
#   <fct> <fct> <chr> <int>
# 1 4     a     f        NA
# 2 4     b     g        NA
# 3 4     b     h        NA

df %>% filter(eval(parse_expr(remove2)))
# # A tibble: 3 x 4
#   A     B     C         D
#   <fct> <fct> <chr> <int>
# 1 4     a     f        NA
# 2 4     b     g        NA
# 3 4     b     h        NA
www
  • 38,575
  • 12
  • 48
  • 84
  • 2
    I'm not the downvoter (actually I upvoted). Just my 2cts: code as string seems to be an antipattern. I'd go with: `remove2 <- rlang::expr(as.numeric(as.character(A)) != 2L) ; filter(df, !! remove2)` – Aurèle May 19 '18 at 12:52
  • 1
    (or with `quote` instead of `rlang::expr`) – Aurèle May 19 '18 at 14:14
  • @Aurèle would you mind posting an answer with some `rlang` options? I'm trying to wrap my head around this, and also interested in how to do this starting with the operation in a string, as the OP has – camille May 19 '18 at 16:59
  • @camille Sure, https://stackoverflow.com/a/50428178/6197649 . I don't recommend code in a string, for reasons detailed in my answer. Though if we were to do it, I'd go with this answer from www – Aurèle May 19 '18 at 18:15
  • @www the eval(parse_expr.. is simply elegant and quite sophisticated for me!! – Geet May 20 '18 at 07:28
3

Code as a string is an anti-pattern. It raises the question: where does the string come from?

If it's you, the developer, typing it, it's both more difficult to write (you don't benefit from your IDE features such as auto-completion), and much more prone to bugs (you can write syntactically invalid code that won't get caught before it's actually parsed and evaluated, possibly much later, raising harder to understand errors).

If it's input from a user that is not you, it's a major security hole.

You could do instead:

remove2 <- quote(as.numeric(as.character(A)) != 2L)

filter(df, !! remove2)

(!! is the "unquote" operator in the tidyeval framework).

Though it's not completely satisfying either (still a code smell, in my opinion), because it's rare to have to unquote entire pieces of code, usually it's just a variable name.

Aurèle
  • 12,545
  • 1
  • 31
  • 49
  • I thought filter_ would accept quoted expression and solve the problem, but this quote unquote is great to avoid that. – Geet May 20 '18 at 07:24