3

I am beginner of R, I met a problem which might be simple for you. Thanks in advance if could give me some help. I am not sure whether the title does reflect the problem I want to ask. To make my problem clear, I will use a simple example.

Let's say we have data frame containing two factors (FE and DI) and three variables (SR1, SR2 and SR3) like:

df<-data.frame(FE=rep(c("FL","FM","FH"),4),DI=rep(c("DL","DH"),each=6),
SR1=rpois(12,10),SR2=rpois(12,15),SR3=rpois(12,20))

I know how to calculate the means of variables according to the factors by using "aggregate", for example:

df.me1<-aggregate(SR1~FE,df,mean)
df.me2<-aggregate(cbind(SR1,SR2,SR3)~FE+DI,df,mean)

Then, I make two characters (vars and facs) consisting of names of the three variables and the two factors:

vars<-c("SR1","SR2","SR3")
facs<-c("FE","DI")

Now, I want to do the calculations in the following formula for some reason

df.me1<-aggregate(vars[1]~facs[1],df,mean)
df.me2<-aggregate(cbind(vars[1],vars[2],vars[3])~facs[1]+facs[2],df,mean)

The codes certainly do not work, so what should I do to make them work in this way?

Myosotis
  • 53
  • 5

3 Answers3

4

There are two ways to do this. One would be through aggregate's formula interface, which is what you're currently trying to do. In order to make this work, you'd have to create a string that includes your dependent and independent variables. Then you'd convert that string to a formula object using as.formula(). This is overcomplicated, since it requires a lot of witchcraft with sprintf and/or paste.

A simpler way to do this would be through aggregate's by argument, which is a little more friendly for substitutions made through variable names.

df.me1 <- aggregate(df[vars[1]], by = df[facs[1]], FUN = mean)

  FE   SR1
1 FH 10.00
2 FL 10.00
3 FM  9.25

df.me2 <- aggregate(df[vars], by = df[facs], FUN = mean)

  FE DI  SR1  SR2  SR3
1 FH DH  9.0 11.5 22.5
2 FL DH  8.0 16.5 21.5
3 FM DH 10.0 14.5 21.0
4 FH DL 11.0 16.5 18.0
5 FL DL 12.0 18.0 15.0
6 FM DL  8.5 13.0 24.0
jdobres
  • 11,339
  • 1
  • 17
  • 37
  • Your solution works for the "aggregate" function example here, however, it doesn't solve my problem how to convert character to variable names used in a formula. Therefore, I prefer the answers using "eval(parse(text = "A string to execute"))" or "get()" which are more general solutions for my problem. Thank you anyway. – Myosotis Oct 19 '16 at 13:21
3

For a more generic solution for dealing with strings in equations I like using the functionality eval(parse(text = "A string to execute")) for example in your code

eval(parse(text = paste("df.me1<-aggregate(",vars[1],"~",facs[1],",df,mean)",sep="")))

and I get the following result

> df.me1
  FE   SR1
1 FH  9.75
2 FL 10.75
3 FM 10.25

I also find that functionality useful when retrieving information in a list that is referenced by a string.

here is the paste command

> paste("df.me1<-aggregate(",vars[1],"~",facs[1],",df,mean)",sep="")
[1] "df.me1<-aggregate(SR1~FE,df,mean)"

For the second part

eval(parse(text = paste("df.me2<-aggregate(cbind(",vars[1],",",vars[2],",",vars[3],")~",facs[1],"+",facs[2],",df,mean)",sep="")))
Cyrillm_44
  • 701
  • 3
  • 17
  • Great, I like your answer best. I can see that "eval(parse(text = "A string to execute"))" is a more general solution and really solves my problem. Thank you so much! – Myosotis Oct 19 '16 at 13:16
1

@jdobres' answer is cleaner and probably better in most instances, but if you must do this exactly as you've written it, then referencing this answer, you can just use get().

df.me2<-aggregate(cbind(SR1,SR2,SR3)~FE+DI,df,mean)
df.me2.get<-aggregate(cbind(get(vars[1]),get(vars[2]),get(vars[3]))~get(facs[1])+get(facs[2]),df,mean)

And checking if they are the same:

df.me2 == df.me2.get

       FE   DI  SR1  SR2  SR3
[1,] TRUE TRUE TRUE TRUE TRUE
[2,] TRUE TRUE TRUE TRUE TRUE
[3,] TRUE TRUE TRUE TRUE TRUE
[4,] TRUE TRUE TRUE TRUE TRUE
[5,] TRUE TRUE TRUE TRUE TRUE
[6,] TRUE TRUE TRUE TRUE TRUE
Community
  • 1
  • 1
BLT
  • 2,492
  • 1
  • 24
  • 33