I am trying to write a function that takes two column names and upper and/or lower boundary for each column names, so that that way I can subset the data with column names and boundary of my choice.
Using mtcars as an example, if I want to subset the data by saying I only want rows that has cyl > 4
and mpg > 15
, in this case my function would take in two column names cyl
and mpg
, also two lower boundary for each column name which are 4 and 15. Of course in the function, I have the choice to assign a upper boundary to it to keep column names(variables) within certain range.
So I came up with something like below, a function which takes two variable names of your choice and upper and/or lower boundaries for each variable.
If I only give an upper or lower boundary for this variable then it would give me anything less than or more than this boundary, if I give the function both upper and lower boundary it gives me back the rows that fall into the range.
comb_function<-function(df,var1,var2,var1_lower=NULL,var1_upper=NULL,var2_upper=NULL,var2_lower=NULL){
var1<-enexpr(var1)
var2<-enexpr(var2)
#####for var2,if upper boundary are given by user,do this#####{
filter1<-expr(`$`(df,!!var2))<=var2_upper
#for var1, if upper boundary are given by user,do this# {
filter2<-expr(`$`(df,!!var1))<=var1_upper}
#for var 1,if lower boundary are given by user, do this#{
filter2<-expr(`$`(df,!!var1))>=var1_lower}
#for var1, if both are given by user, do this#{
filter2<-expr(`$`(df,!!var1))>=var1_lower&expr(`$`(df,!!var1))<=var1_upper}
}
#####for var2,if lower boundary are given by user,do this#####{
filter1<-expr(`$`(df,!!var2))>=var2_lower
#for var1,if upper boundary are given by user,do this#{
filter2<-expr(`$`(df,!!var1))<=var1_upper}
#for var1,if lower boundary are given by user,do this#{
filter2<-expr(`$`(df,!!var1))>=var1_lower}
#if both are given by the user,do this{
filter2<-expr(`$`(df,!!var1))>=var1_lower&expr(`$`(df,!!var1))<=var1_upper}
}
#####for var2,if both are given by user,do this#####{
filter1<-expr(`$`(df,!!var2))<=var2_upper&expr(`$`(df,!!var2))>=var2_lower
#for var1,if upper boundary are given by user,do this#{
filter2<-expr(`$`(df,!!var1))<=var1_upper}
#for var1,if lower boundary are given by user,do this#{
filter2<-expr(`$`(df,!!var1))>=var1_lower}
#if both are given by user, do this#{
filter2<-expr(`$`(df,!!var1))>=var1_lower&expr(`$`(df,!!var1))<=var1_upper}
}
output<-df%>%filter(filter1,filter2)%>%summarise(count=n(),avgcyl=mean(cyl,na.rm=TRUE))
return(output)
}
When I call this function using mtcars as an example
final1<-comb_function(df=mtcars,var1=mpg,var2=cyl,var1_lower =15,var2_lower=4,var2_upper=6)
I got 0 count and NaN for avgcrl in final1. So when filter()
evaluates what inside the ()
, it only gets FALSE, no TRUE, I think that's why no rows gets returned.
I have a theory for why this is happening. If I do this:
x<-expr(cyl);eval(expr(expr(`$`(mtcars,!!x))<=6))
It returns:
[1]FALSE
which is apparently not what I expected to have. If I do this:
eval(expr(`$`(mtcars,!!x)))<=6
It returns
[1] TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
[23] FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
which is what I want for the filter()
function inside my function. So I am guessing when filter()
evaluates what's inside ()
, it automatically put the bracket around the whole expression, just like
eval(expr(expr(`$`(mtcars,!!x))<=6))
did, which only gives back one FALSE. So if this is really the reason like I expected, how do I let filter()
know what I really want is for it to evaluate like this:
eval(filter1<-expr(`$`(df,!!var2)))<=var2_upper
not this:
eval(filter1<-expr(`$`(df,!!var2))<=var2_upper)
If what I guessed is not what's going on, please help me as well.