dplyr NSE mode in a function: nested conditions

Question

The objective is to transform column(s) of a data frame. Here is the example:

  df <- data.frame( fact=c("dog",2,"NA",0,"cat",1,"Cat"),
              value=c(4,2,6,0,9,1,3) ); df$fact <- as.factor(df$fac)

  func <- function(data,fac,val){
          data <- data %>%  
          mutate_(fac= interp(~tolower(fac), fac=as.name(fac)) ) %>%
          mutate_(val= interp(~ifelse(fac=='cat',1*val,
                       ifelse(fac=='dog',2*val,0)), fac=as.name(fac), val=as.name(val)))
  return(data) }

The call:

new.df <- func(df,"fact","value")

     fact value  fac val
   1  dog     4  dog  8
   2    2     2   2   0
   3   NA     6  na   0
   4    0     0   0   0
   5  cat     9 cat   9
   6    1     1   1   0
   7  Cat     3 cat   0

presents 2 issues: (1)- the value associated to "Cat" is false; should be 3*1=3 (2)- the call ideally returns the original data.frame df with the transformed fact and value variables.

Any thoughts? Thank you guys.

Edit: please note that df has another column third which should be left unaffected by the operations done to fact and value.

akrun · Answer 1 · 2015-09-24T14:21:36.727

In the OP's code, the 'val' was created based on the unmodified 'fact' column. If we are using the modified 'fac' from the first mutate_, we don't need as.name(fac).

library(lazyeval)
library(dplyr)
func <- function(data,fac,val){
      data <- data %>%  
               mutate_(fac= interp(~tolower(fac), fac=as.name(fac))) %>%
               mutate_(val= interp(~ifelse(fac=='cat',1*val,
                   ifelse(fac=='dog',2*val,0)), val=as.name(val)))
  return(data) } 

func(df, 'fact', 'value')
#  fact value fac val
#1  dog     4 dog   8
#2    2     2   2   0
#3   NA     6  na   0
#4    0     0   0   0
#5  cat     9 cat   9
#6    1     1   1   0
#7  Cat     3 cat   3

If we need to return only the modified columns, use transmute_

func1 <- function(data,fac,val){
      data <- data %>%  
               transmute_(fac= interp(~tolower(fac), fac=as.name(fac)), 
                      val= interp(~ifelse(fac=='cat',1*val,
                         ifelse(fac=='dog',2*val,0)), val=as.name(val)))
     return(data) } 

func1(df, 'fact', 'value')
#  fac val
#1 dog   8
#2   2   0
#3  na   0
#4   0   0
#5 cat   9
#6   1   0
#7 cat   3

@@akrun: thanks so much, didn't know about `transmute`. – remi Sep 24 '15 at 14:19 — remi, Sep 24 '15 at 14:19

score 2 · Accepted Answer · answered Sep 24 '15 at 14:28

If you want to return the original columns (potentially including other columns in your data.frame) with the original names, you can use a slightly different dplyr-approach with mutate_each instead of mutate:

library(lazyeval)
library(dplyr)

func <- function(data,fac,val) {
  data %>%  
    mutate_each_(interp(~tolower(var), var = as.name(fac)), fac) %>% 
    mutate_each_(interp(~ifelse(col =='cat', var, ifelse(col == 'dog',2*var, 0)), 
             var=as.name(val), col = as.name(fac)), val)
}

Using the function:

func(df, "fact", "value")
#  fact value
#1  dog     8
#2    2     0
#3   na     0
#4    0     0
#5  cat     9
#6    1     0
#7  cat     3

The difference to akruns answer is demonstrated if you have other columns in your data that you like to keep in it (whereas they would be removed with akrun's approach because of the transmute):

df$some_column <- letters[1:7]  # add a new column

Other columns now remain in your data after using the function and the modified columns keep their original names:

func(df, "fact", "value")
#  fact value some_column
#1  dog     8           a
#2    2     0           b
#3   na     0           c
#4    0     0           d
#5  cat     9           e
#6    1     0           f
#7  cat     3           g

@@docendo: I realize akrun solution presents one issue, I haven't mentioned in the question: `df` has other columns (ie variables) which should remain unaffected after the transformations done to `fact` and `value`. Any idea how to achieve this? — remi, Sep 24 '15 at 15:08
@remi, have you tried using the function in my answer? It does exactly that. — talat, Sep 24 '15 at 15:13
@@docendo: indeed, your solution resolved the issue before I became aware of it! I got another worry :-\ : handling the NA's, given the `tolower` operation. Should one include: `na.omit()` to resolve this? — remi, Sep 24 '15 at 16:59

dplyr NSE mode in a function: nested conditions

2 Answers2