0

Here is the data I am working with. https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip

I'm trying to create a function called pollutantmean that will load selected files, aggregate (rbind) the columns, and return a mean of a certain column. I have figured out everything except how to run the loop so I can turn the multiple files into one big data frame.

for (id in 1:5) {
  files_full <- Sys.glob("*.csv")
  fileQ <- files_full[[id]]
  empty_tbl <- rbind(empty_tbl, read.csv(fileQ, header = TRUE))
}

This for loop works by itself but when i try and use my bigger function

pollutantmean <- function(directory = "specdata", pollutant, id =              1:332) {
  empty_tbl <- data.frame()
  for (id in 1:332) {
    files_full <- Sys.glob("*.csv")
    fileQ <- files_full[[i]]
    empty_tbl <- rbind(empty_tbl, read.csv(fileQ, header = TRUE))
  }

  goodata <- na.omit(empty_tbl)
  if(pollutant == "sulfate") {
    mean(goodata[,2])
  } else {
    mean(goodata[,3])
  }
}

I get the:

"Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection".

I am at a complete loss over how to fix this and have tried many, many different ways. I'm sure I'm messing something up with the naming of the file but I try the for loop by itself and it works fine...

Leonardo
  • 2,439
  • 33
  • 17
  • 31

1 Answers1

0

Consider using lapply() on csv files that uses the directory argument of function. Below assumes specdata is a subfolder of the current working directory:

pollutantmean <- function(directory = "specdata", pollutant) {

   files_full <- Sys.glob(paste0(directory,"/*.csv"))[1:332]  # FIRST 332 CSVs IN DIRECTORY

   dfList <- lapply(files_full, read.csv, header=TRUE)
   df <- do.call(rbind, dfList)

   gooddata <- na.omit(df)
   pmean <- ifelse(pollutant == "sulfate", mean(gooddata[,2]), mean(gooddata[,3]))

}
Parfait
  • 104,375
  • 17
  • 94
  • 125