I have a large number of csv files that I want to read into R. All the column headings in the csvs are the same. But I want to import only those rows from each file into the data frame for which a variable is within a given range (above min threshold & below max threshold), e.g.
v1 v2 v3
1 x q 2
2 c w 4
3 v e 5
4 b r 7
Filtering for v3 (v3>2 & v3<7) should results in:
v1 v2 v3
1 c w 4
2 v e 5
So fare I import all the data from all csvs into one data frame and then do the filtering:
#Read the data files
fileNames <- list.files(path = workDir)
mergedFiles <- do.call("rbind", sapply(fileNames, read.csv, simplify = FALSE))
fileID <- row.names(mergedFiles)
fileID <- gsub(".csv.*", "", fileID)
#Combining data with file IDs
combFiles=cbind(fileID, mergedFiles)
#Filtering the data according to criteria
resultFile <- combFiles[combFiles$v3 > min & combFiles$v3 < max, ]
I would rather apply the filter while importing each single csv file into the data frame. I assume a for loop would be the best way of doing it, but I am not sure how. I would appreciate any suggestion.
Edit
After testing the suggestion from mnel, which worked, I ended up with a different solution:
fileNames = list.files(path = workDir)
mzList = list()
for(i in 1:length(fileNames)){
tempData = read.csv(fileNames[i])
mz.idx = which(tempData[ ,1] > minMZ & tempData[ ,1] < maxMZ)
mz1 = tempData[mz.idx, ]
mzList[[i]] = data.frame(mz1, filename = rep(fileNames[i], length(mz.idx)))
}
resultFile = do.call("rbind", mzList)
Thanks for all the suggestions!