I am trying to write an R script that calculates the mean of a specified pollutant (nitrate or sulfate) based on data from one or more of 332 monitor stations. The data from each station is held in a separate file, numbered 1:332. I am new to R and, to be fair to anyone who chooses to help me, I should say that this is a homework problem. I have written the script below, which works for just one file:
pollutantmean <- function(directory, pollutant, id = 1:332) {
filepath <- "/Users/jim/Documents/Coursera/2_R_Prog/Data"
for(i in seq_along(id)) {
if(id < 10) {
name <- paste("00", id[i], sep = "")
}
if(id >= 10 && id < 100) {
name <- paste("0", id[i], sep = "")
}
if(id >= 100) {
name <- id[i]
}
}
file <- paste(name, "csv", sep = ".")
station <- paste(filepath, directory, file, sep = "/")
monitor <- read.csv(station)
if(pollutant == "nitrate") {
x <- mean(monitor$nitrate, na.rm = T)
}
if(pollutant == "sulfate") {
x <- mean(monitor$sulfate, na.rm = T)
}
x
}
However, if I enter more than one file (eg 70:72) I get the mean for the last file only (72). This suggests to me that it is calculating the mean for each file and then overwriting it with the mean of the next, so that only the last is outputted. I would be able to solve this using rbind(), but I can't figure out how to assign unique names for each variable which would then become the arguments for rbind(). I would be grateful for any help anyone can offer. Cheers, Jim