I'm trying to do some data analysis as follows: I have about 100 subjects, each of whom have a file containing 40,000 lines of numbers. I also have an index file with 40,000 corresponding lines containing group number. I am trying to get the means of each group, for each subject. I can do this easily for one subject with tapply, like this:
tapply(df$numbers, df$group, mean)
I can also load in a data frame containing the filenames of each subject's data. What I'd like to do is create a for loop in which I can get the output of the above tapply function for each subject, probably by looping over the filenames and pulling in each one as a new data frame (maybe??). And ultimately I'll want to output this to a .csv with subject names as rows and group names as columns.
Right now I'm very stuck. Can anyone provide some insight?
EDIT: Here's my solution, provided by super helpful user jyr below, with some minor tweaking. One thing I wasn't clear about was that my legend (the file with the list of labels) is its own file, rather than a column in each data file. Also, tapply was being a jerk about argument length, so I had to do some extra data frame creation. Here's the final solution:
labels_L <-read.table("C:/Users/jakes/Desktop/HMAT-files/CIVET_HMAT_left.txt")
new_df<-c()
listfiles <- dir("C:/users/jakes/Desktop/HMAT-files/thickness/left")
for(f in listfiles){
thick <-read.table(file.path("C:/users/jakes/Desktop/HMAT-files/thickness/left",f), header=FALSE)
df = data.frame(labels_L, thick)
new_line <- c(f, tapply(df$V1.1, df$V1, mean))
new_df <- rbind(new_df, new_line)
}
write.csv(new_df,"C:/users/jakes/Desktop/HMAT-thickness-L.csv")
Thank you so much for your help, this forum saved me countless hours!