Create formula inside loop over column names

Question

I would like to loop through columns in a data set and use the name of the column to aggregate the data set. However, I am getting an error when I try to feed through the column name into the aggregate function:

"Error in model.frame.default(formula = cbind(SurveyID) ~ Panel + Category + : variable lengths differ (found for 'i')"

Once I can store this is a temp file, I will add the temp file to a permanent dataset; however, I can't get past this part. Any help would be so much appreciated!

#example of my data:
    df <- data.frame("SurveyID" = c('A','B','C','D'), "Panel" = c('E','E','S','S'), "Category" = c(1,1,2,3), "ENG" = c(3,3,1,2), "PAR"
    = c(3,1,1,2), "REL" = c(3,1,1,2), "CLC"= c(3,1,1,2))

#for loop to get column name to include as part of the aggregate function
    for (i in colnames(df[4:7])) {
      print (i)
     temp <-  data.frame(setNames(aggregate(cbind(SurveyID) ~ Panel + Category + i, data = df, FUN = length), c("Panel","GENDER", "Favlev", "Cnt")))       
}

I believe this is happening because the i that you have in the loop is not in the scope of the data frame you're making that formula for. I think `aggregate` uses non-standard evaluation to find the value for the formula. Alternatively, you could make the formula before calling aggregate like `as.formula(paste(...+ Category + i))` — svenhalvorson, Jan 31 '20 at 18:11
substituting in df[[i]] worked. I am still learning R- why does it need the double []. I tried this with just a single [] before I posted and also got an error. — user9084595, Jan 31 '20 at 18:55
The single `[ ]` is looking for an index number, white the double `[[ ]]` will accept will take a name, useful in this cases such as this. See `?[[` for more information — Dave2e, Jan 31 '20 at 19:11
@Dave2e Both "[" and "[[" can take name arguments. What's different about them is the type of returned value. — IRTFM, Jan 31 '20 at 23:11

score 1 · Answer 1 · answered Jan 31 '20 at 22:39

You are making one newbie mistake and one more sophisticated mistake:

Newb mistake: failing to index successive items upon assignment, i.e., overwriting earlier values with new values.

Not so newb mistake. Improper construction of formula objects. Need as.formula

temp=list()   # need empty list with a name
for (i in colnames(df[4:7])) {
     print (i); form <- as.formula( paste( "SurveyID ~ Panel + Category +", i) )
     temp[[i]] <-  data.frame(setNames(aggregate(form, data = df, FUN = length), c("Panel","GENDER", "Favlev", "Cnt")))       
 }
#Output
[1] "ENG"
[1] "PAR"
[1] "REL"
[1] "CLC"

 str(temp)
#----------------
List of 4
 $ ENG:'data.frame':    3 obs. of  4 variables:
  ..$ Panel : Factor w/ 2 levels "E","S": 2 2 1
  ..$ GENDER: num [1:3] 2 3 1
  ..$ Favlev: num [1:3] 1 2 3
  ..$ Cnt   : int [1:3] 1 1 2
 $ PAR:'data.frame':    4 obs. of  4 variables:
  ..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
  ..$ GENDER: num [1:4] 1 2 3 1
  ..$ Favlev: num [1:4] 1 1 2 3
  ..$ Cnt   : int [1:4] 1 1 1 1
 $ REL:'data.frame':    4 obs. of  4 variables:
  ..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
  ..$ GENDER: num [1:4] 1 2 3 1
  ..$ Favlev: num [1:4] 1 1 2 3
  ..$ Cnt   : int [1:4] 1 1 1 1
 $ CLC:'data.frame':    4 obs. of  4 variables:
  ..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
  ..$ GENDER: num [1:4] 1 2 3 1
  ..$ Favlev: num [1:4] 1 1 2 3
  ..$ Cnt   : int [1:4] 1 1 1 1

Create formula inside loop over column names

1 Answers1