I frequently want to perform functions on a set or range of columns in a dataframe. Most commonly, I want to take the mean of a range of columns that share a common prefix (in this toy example, VarA.
, VarB.
, and VarC.
:
ID<-c(1:300) #participant ID column, N=300
VarA.1<-sample(x = 0:50,size = 300, replace = TRUE)
VarA.2<-sample(x = 0:50,size = 300, replace = TRUE)
VarA.3<-sample(x = 0:50,size = 300, replace = TRUE)
VarB.1<-sample(x = 0:30,size = 300, replace = TRUE)
VarB.2<-sample(x = 0:30,size = 300, replace = TRUE)
VarB.3<-sample(x = 0:30,size = 300, replace = TRUE)
VarC.1<-sample(x = 0:10,size = 300, replace = TRUE)
VarC.2<-sample(x = 0:10,size = 300, replace = TRUE)
VarC.3<-sample(x = 0:10,size = 300, replace = TRUE)
df<-data.frame(ID,VarA.1,VarA.2,VarA.3,
VarB.1,VarB.2,VarB.3,
VarC.1,VarC.2,VarC.3)
rm(ID,VarA.1,VarA.2,VarA.3,
VarB.1,VarB.2,VarB.3,
VarC.1,VarC.2,VarC.3)
I usually have a ton of variables, so I can't memorize the column numbers. Let's say I want to take the average of all columns starting with VarA.
and put it in a new column called VarA
. Here is my usual approach:
x<-which(colnames(df)=="VarA.1")
y<-which(colnames(df)=="VarA.3")
df$VarA<-rowMeans(df[, c(x:y)])
Maybe I'm being too picky, but given that I have to do this (or something very similar) upwards of 20 times in some scripts, it just looks really messy and clunky, and it's hard to remember: I have to dig up a previous file and then copy and paste and carefully change all the values to fit my current dataset. I'd really like to make this into a function, but I'm not very familiar with user-defined functions and I'm having trouble figuring out how to deal with multiple variables.
The approach I tried was:
colmeans <- function(x,y,df,meancol) {
first<-which(colnames(df)==x)
last<-which(colnames(df)==y)
df$meancol<-rowMeans(df[, c(first:last)])
}
colmeans("VarA.1","VarA.3",df,"VarA")
I could have sworn it worked at one point but I lost it and I can't remember what I changed. What am I missing?
I'm also open to other ideas about how to make this process more efficient.