I'm currently writing a program (full disclosure, it's "homework"). The program is designed to run through a series of files based on a range given, collate them into one large table sans NAs and find the mean of the pollutant provided (which is a column in the table).
I wrote the program previously, but wanted to play around with compartmentalising the functions a bit more, so I rewrote it.
Strangely, some ranges return the exact numeric as in the original program, while others return (relatively) radically different results.
For instance:
pollutantmean("specdata", "sulfate", 1:10)
Old Program: 4.064128
New Program: 4.064128
pollutantmean("specdata", "nitrate", 23)
Old Program: 1.280833
New Program: 1.280833
pollutantmean("specdata", "nitrate", 70:72)
Old Program: 1.706047
New Program: 1.732979
In that final example, the old program is producing the expected result, while the new program is producing a result not within the acceptable margin of error at all.
I'm simply at a loss, I've been trying to rewrite my new code so as to minimise differences with the old cold without simply reproducing the old program, and the current code will be below (with the original program). But nothing is working, I continue to receive the exact same (bad) result despite quite a few changes being made.
New Program:
concatTables <- function(directory, id, hasHeader = TRUE, keepNAs = FALSE) {
totalTable <- NULL
currentTable <- NULL
for (file in id) {
filename <- paste( sep ="",
directory,"/",formatC(file,width=3,format="d",flag="0"),".csv"
);
currentTable <- read.csv(file = filename, header = hasHeader);
if (!is.null(totalTable)) {
totalTable <- rbind(totalTable, currentTable);
}
else {
totalTable <- currentTable;
}
}
if (!keepNAs) {
totalTable <- completeRows(totalTable);
}
totalTable
}
completeRows <- function(table) {
table <- table[complete.cases(table),]
table
}
pollutantmean <- function(directory = paste(getwd(),"/specdata",sep = ""), pollutant, id = 1:332, hasHeader = TRUE, keepNAs = FALSE) {
table <- NULL
table <- concatTables(directory,id,hasHeader,keepNAs);
tableMean <- mean(table[[pollutant]]);
tableMean
}
Old Program
(Which produces better results)
dataFileName <- NULL
pollutantmean <- function(directory = "specdata", pollutant, id = 1:332, idWidth = 3, fullLoop = TRUE) {
dataFrame <- NULL
dataFrameTotal <- NULL
for (i in id) {
dataFileName <- paste(directory, "/", formatC(i, width = idWidth, flag = 0), ".csv", sep = "")
if (!is.null(dataFileName)) {
dataFileConnection <- file(dataFileName)
dataFrame <- read.csv(dataFileConnection, header = TRUE)
dataFrameTotal <- rbind(dataFrame, dataFrameTotal)
##close(dataFileConnection)
if (fullLoop == FALSE) {
break
}
}
else print("DATAFILENAME IS NULL!")
}
print(mean(dataFrameTotal[[pollutant]], na.rm = TRUE))
}