I have a data set (called 'data' here) which contains three important kinds of columns: A 'label' column, which corresponds to a list of institutions; a 'group' column that states to which group each institution belongs, and a series of 'measure' columns that indicate a numerical score for each institution on different outcomes/measures.
My task is to write a function which takes user-specified groups and measures, and finds the institution within the given group which has the lowest score on the given measure.
I wrote more or less the following, albeit this is a little pared down and the labels are generic:
func <- function(group, measure) {
data <- read.csv("data.csv")
dataSubset <- data[, c(1, 2, 3, 4, 5)]
headings <- colNames(dataSubset)
measureInputs <- as.character(c("m1", "m2", "m3"))
# A vector of accepted inputs for 'measure', corresponding
# roughly to column names in 'dataSubset'
nameBinding <- as.list(mapply(assign, measureInputs, headings[3:5]))
# Assigns each accepted input to a cognate column name in 'dataSubset'
groupWiselist <- split(dataSubset, dataSubset$Groupcolumn)
# Splits 'dataSubset' by individual groups in the group column (column 2)
# into distinct groupwise data frames
inputGroupdata <- groupWiselist$group
# Creates a single data frame, corresponding to the subset of dataSubset
# picked out by the user specified group
inputMeasurecolumn <- as.vector(inputGroupdata[[nameBinding[[as.character(measure)]]]])
# Creates a vector of values contained in the user specified column
# ('measure'), within the values containing the user specified group
labelMin <- inputGroupdata$Labelcolumn[inputMeasurecolumn == min(inputMeasurecolumn)]
# Finds the label within 'Labelcolumn' on the same row as the minimum
# value of the user specified column
return(as.character(labelMin))
}
When I execute this function inputting my own values I get back:
Warning message: In min(inputMeasurecolumn) : no non-missing arguments to min; returning Inf
No such error occurs when I run the code line by line. If I include an extra line in the code like return(inputMeasurecolumn)
just after defining inputMeasurecolumn, the function returns NULL
; when I run this line by line and input my own values as I go, inputMeasurecolumn returns a sensible vector of exactly the kind I would expect, and min(inputMeasurecolumn)
gives me the minimum value of that vector as expected. The only difference I can see is that when running line by line rather than the generic 'measure' variable which goes into the subsetting that forms inputMeasurecolumn, I directly input the name of the measurement. But since in both instances what goes in there are character objects that refer to column names (thanks to nameBinding
), I really can't see what's up.