4

I'm working on building a function that I will manipulate a data frame based on a string. Within the function, I'll build a column name as from the string and use it to manipulate the data frame, something like this:

library(dplyr)

orig_df  <- data_frame(
     id = 1:3
   , amt = c(100, 200, 300)
   , anyA = c(T,F,T)
   , othercol = c(F,F,T)
)


summarize_my_df_broken <- function(df, my_string) {

  my_column <- quo(paste0("any", my_string))

  df %>% 
    filter(!!my_column) %>% 
    group_by(othercol) %>% 
    summarize(
        n = n()
      , total = sum(amt)
    ) %>%
    # I need the original string as new column which is why I can't
    # pass in just the column name
    mutate(stringid = my_string)


}


summarize_my_df_works <- function(df, my_string) {

  my_column <- quo(paste0("any", my_string))

  df %>% 
    group_by(!!my_column, othercol) %>% 
    summarize(
        n = n()
      , total = sum(amt)
    )  %>%
    mutate(stringid = my_string)

}

# throws an error: 
# Argument 2 filter condition does not evaluate to a logical vector
summarize_my_df_broken(orig_df, "A")

# works just fine
summarize_my_df_works(orig_df, "A")

I understand what the problem is: unquoting the quosure as an argument to filter() in the broken version is not referencing the actual column anyA.

What I don't understand is why it works in summarize(), but not in filter()--why is there a difference?

crazybilly
  • 2,992
  • 1
  • 16
  • 42

2 Answers2

4

Right now you are are making quosures of strings, not symbol names. That's not how those are supposed to be used. There's a big difference between quo("hello") and quo(hello). If you want to make a proper symbol name from a string, you need to use rlang::sym. So a quick fix would be

summarize_my_df_broken <- function(df, my_string) {

  my_column <- rlang::sym(paste0("any", my_string))
  ...
}

If you look more closely I think you'll see the group_by/summarize isn't actually working the way you expect either (though you just don't get the same error message). These two do not produce the same results

summarize_my_df_works(orig_df, "A")
#  `paste0("any", my_string)` othercol     n total
#                        <chr>    <lgl> <int> <dbl>
# 1                       anyA    FALSE     2   300
# 2                       anyA     TRUE     1   300

orig_df  %>% 
  group_by(anyA, othercol) %>% 
  summarize(
    n = n()
    , total = sum(amt)
  )  %>%
  mutate(stringid = "A")
#    anyA othercol     n total stringid
#   <lgl>    <lgl> <int> <dbl>    <chr>
# 1 FALSE    FALSE     1   200        A
# 2  TRUE    FALSE     1   100        A
# 3  TRUE     TRUE     1   300        A

Again the problem is using a string instead of a symbol.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 1
    Ah, I see! `quo()` turns a symbol into a quosure, `enquo()` turns the value of a function argument into a quosure, and `sym()` turns a string into a quosure. So I was passing in a string but treating it like a symbol. It was only appearing to work in `summarize_my_df_works()` because you can summarize based on a function, not because it was actually doing what I expected. – crazybilly Oct 12 '17 at 17:15
0

You don't have any conditions for filter() in your 'broken' function, you just specify the column name.

Beyond that, I'm not sure if you can insert quosures into larger expressions. For example, here you might try something like:

df %>% filter((!!my_column) == TRUE)

But I don't think that would work.

Instead, I would suggest using the conditional function filter_at() to target the appropriate column. In that case, you separate the quosure from the filter condition:

summarize_my_df_broken <- function(df, my_string) {

  my_column <- quo(paste0("any", my_string))

  df %>% 
    filter_at(vars(!!my_column), all_vars(. == TRUE)) %>% 
    group_by(othercol) %>% 
    summarize(
      n = n()
      , total = sum(amt)
    ) %>%
mutate(stringid = my_string)

}

David Klotz
  • 2,401
  • 1
  • 7
  • 16
  • This isn't right. You can have a filter like `orig_df %>% filter(anyA)` work perfectly fine since `anyA` is a column of boolean values. Furthermore if you are soing to use `vars()` then you don't really need quosures as that function also accepts strings just fine: `orig_df %>% filter_at(vars(paste0("any","A")), all_vars(. == TRUE)) ` – MrFlick Oct 12 '17 at 15:57
  • Using filter_at() is a good idea and certainly addresses the problem at hand--MrFlick's solution is a good one to the example problem. However, it doesn't get at my main question, namely why does the quosure work in summarize() but not in filter()? I suspect there's some foundational understanding that I'm missing regarding NSE. – crazybilly Oct 12 '17 at 16:51