Use function in loop over set of files in directory

Question

I'm trying to do some data analysis as follows: I have about 100 subjects, each of whom have a file containing 40,000 lines of numbers. I also have an index file with 40,000 corresponding lines containing group number. I am trying to get the means of each group, for each subject. I can do this easily for one subject with tapply, like this:

tapply(df$numbers, df$group, mean)

I can also load in a data frame containing the filenames of each subject's data. What I'd like to do is create a for loop in which I can get the output of the above tapply function for each subject, probably by looping over the filenames and pulling in each one as a new data frame (maybe??). And ultimately I'll want to output this to a .csv with subject names as rows and group names as columns.

Right now I'm very stuck. Can anyone provide some insight?

EDIT: Here's my solution, provided by super helpful user jyr below, with some minor tweaking. One thing I wasn't clear about was that my legend (the file with the list of labels) is its own file, rather than a column in each data file. Also, tapply was being a jerk about argument length, so I had to do some extra data frame creation. Here's the final solution:

labels_L <-read.table("C:/Users/jakes/Desktop/HMAT-files/CIVET_HMAT_left.txt")
new_df<-c()
listfiles <- dir("C:/users/jakes/Desktop/HMAT-files/thickness/left")
for(f in listfiles){
        thick <-read.table(file.path("C:/users/jakes/Desktop/HMAT-files/thickness/left",f), header=FALSE)
    df = data.frame(labels_L, thick)
         new_line <- c(f, tapply(df$V1.1, df$V1, mean))
         new_df <- rbind(new_df, new_line)
 }
write.csv(new_df,"C:/users/jakes/Desktop/HMAT-thickness-L.csv")

Thank you so much for your help, this forum saved me countless hours!

Hi Jake! To make this question better and easier for everyone else to help you, it would be great if you gave us a [reproducible example](https://stackoverflow.com/help/mcve). Also, if you can put an example of the code that you've done that would also be great. If you can put the first part of your data `head(df)` that will give us some data input to work with. We'll be able to help you much better then :) — Lasarus9, Feb 14 '19 at 20:23
You need to give us some context on the data. One great way to do this is to give us the output from `dput(x)` where `x` is a representative *but not big* sample if your data. This might be as simple as `head(dat,n=10)`, but make sure you have at least two of each "grouping" id. — r2evans, Feb 14 '19 at 20:23
@Jake Thank you for this! I recommend you edit your question and add this to the question itself. It's good to keep things together! :) — Lasarus9, Feb 14 '19 at 20:28

score 0 · Accepted Answer · answered Feb 14 '19 at 20:56

0

you can read file names with dir and then loop over them, read each file and do your tapply, create vector with name of file and results for each file and merge them with rbind. I hope that this is similar to what you wanted or at least it can push you in right direction.

new_df<-c()
list_of_files <- dir("your_folder_where_data_is")
for(f in list_of_files){
         df <- read.csv(file.path("your_folder_where_data_is",f))
         new_line <- c(f, tapply(df$V1.1, df$V1, mean))
         new_df <- rbind(new_df, new_line)
 }

answered Feb 14 '19 at 20:56

jyr

690
6
20

This looks very promising, thank you! I'll give it a shot tomorrow morning and report back. – Jake Feb 14 '19 at 23:51
And it worked, with some minor tweaking! Will post the full answer in my original post. Thank you!! – Jake Feb 15 '19 at 14:49

Use function in loop over set of files in directory

1 Answers1