2

I have created a dataset in R as follows:

m <- mtcars
m$dep<- ifelse(m$mpg <=16,1,0)

Now if I try to sum the variable dep as per the group done on the basis of cyl

a <-aggregate(dep_var~ cyl, FUN=sum, data=m)
a

I get the desired result. However, my problem is if I try to convert it into a user defined function to automate it, I am getting an error . I tried the following code:

f<- function(target,variable,data){
  a <-aggregate(target ~ variable, FUN=sum, data=data)
  return(a)
}
f(dep,cyl,m)

Could you please help me in this regard. Could you please also tell me when should I use double quotes while calling a function? Eg. f("dep","cyl",m). I tried this code for my function but it didn't work too.

Please some body help me to rectify the function.

s_scolary
  • 1,361
  • 10
  • 21
shejomamu
  • 141
  • 2
  • 13

2 Answers2

3

1) It is easier not to use the formula interface in this case. First get target and variable names as character strings and then run the aggregate:

f1 <- function(target, variable, data) {
  target <- deparse(substitute(target))
  variable <- deparse(substitute(variable))
  aggregate(data[target], data[variable], sum)
}
f1(dep, cyl, m)

giving:

  cyl dep
1   4   0
2   6   0
3   8  10

2) If you want to pass the column names as character strings directly rather than unevaluated expressions as we did above then it is even easier and gives the same output:

f2 <- function(target, variable, data) {
  aggregate(data[target], data[variable], sum)
}
f2("dep", "cyl", m)

3) Although the question asked for an aggregate solution there was an sqldf tag on it so in case you want an sqldf solution here is one in which the names are passed. If you want to pass unevaluted expressions use the same approach as in (1) with deparse(substitute(...)):

library(sqldf)
f3 <- function(target, variable, data) {
    fn$sqldf("select $variable, sum($target) from data group by $variable")
}
f3("dep", "cyl", m)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

You need to add as.formula within your function. This should get you the desired output.

f <- function(target,variable,data){
  a <- aggregate(as.formula(paste(target,variable,sep=" ~ ")), FUN = sum, data = data)
  return(a)
}

f("dep","cyl",m)
> f("dep","cyl",m)
  cyl dep
1   4   0
2   6   0
3   8  10
s_scolary
  • 1,361
  • 10
  • 21
  • I couldn't thank you enough for your help. I would also like to thank G. Grothendieck for showing the use of SQLDF. I was just about to ask you how to do it using sqldf. May I ask you all just one more suggestion? Where do I get to learn the ways of learning functions, just the way you have shown. Is there any book or website which could be followed. – shejomamu Nov 25 '15 at 08:24