I am trying to write a function in R which takes 3 inputs:
- Directory
- pollutant
- id
I have a directory on my computer full of CSV's files i.e. over 300. What this function would do is shown in the below prototype:
pollutantmean <- function(directory, pollutant, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files
## 'pollutant' is a character vector of length 1 indicating
## the name of the pollutant for which we will calculate the
## mean; either "sulfate" or "nitrate".
## 'id' is an integer vector indicating the monitor ID numbers
## to be used
## Return the mean of the pollutant across all monitors list
## in the 'id' vector (ignoring NA values)
}
An example output of this function is shown here:
source("pollutantmean.R")
pollutantmean("specdata", "sulfate", 1:10)
## [1] 4.064
pollutantmean("specdata", "nitrate", 70:72)
## [1] 1.706
pollutantmean("specdata", "nitrate", 23)
## [1] 1.281
I can read the whole thing in one go by:
path = "C:/Users/Sean/Documents/R Projects/Data/specdata"
fileList = list.files(path=path,pattern="\\.csv$",full.names=T)
all.files.data = lapply(fileList,read.csv,header=TRUE)
DATA = do.call("rbind",all.files.data)
My issue are:
- User enters id either atomic or in a range e.g. suppose user enters 1 but the file name is 001.csv or what if user enters a range 1:10 then file names are 001.csv ... 010.csv
- Column is enetered by user i.e. "sulfate" or "nitrate" which he/she is interested in getting the mean of...There are alot of missing values in these columns (which i need to omit from the column before calculating the mean.
The whole data from all the files look like this :
summary(DATA)
Date sulfate nitrate ID
2004-01-01: 250 Min. : 0.0 Min. : 0.0 Min. : 1.0
2004-01-02: 250 1st Qu.: 1.3 1st Qu.: 0.4 1st Qu.: 79.0
2004-01-03: 250 Median : 2.4 Median : 0.8 Median :168.0
2004-01-04: 250 Mean : 3.2 Mean : 1.7 Mean :164.5
2004-01-05: 250 3rd Qu.: 4.0 3rd Qu.: 2.0 3rd Qu.:247.0
2004-01-06: 250 Max. :35.9 Max. :53.9 Max. :332.0
(Other) :770587 NA's :653304 NA's :657738
Any idea how to formulate this would be highly appreciated...
Cheers