I banged my head for the last couple of hours but still unable to resolve this ...
I am trying to write a R function which takes a dataframe name and a column name as variables and tries to return a dataframe with all distinct values for the column specified, minus any NA or "N/A" values.
Here is my function,
getDistinctColValues <- function(dataset, colname, removeNA = FALSE) {
colname <- as.name(colname)
retVector <- dataset %>% distinct_(colname)
# Not working!
if (removeNA == TRUE)
{
retVector <- filter_(retVector, colname != "N/A" | !is.null(colname))
}
return(retVector)
}
Here is a sample output (see the N/A):
> getDistinctColValues(dataTY, "SomeColumn", TRUE)
SomeColumn
1 BR
2 ET
3 SG
4 BV
5 N/A
6 MN
7 SP
This filter is not working. na.omit won't work because there are "N/A" strings. I am not clear about opting out of NSE. I am using the lazyeval package but without understanding it in-depth.
Any help will be greatly appreciated.
Solution (as guided by @aosmith):
getDistinctColValues <- function(dataset, colname, removeNA = FALSE) {
colname <- as.name(colname)
retVector <- dataset %>% distinct_(colname)
if (removeNA == TRUE)
{
filter_criteria <- interp(~v!="N/A", v=as.name(colname))
print(filter_criteria)
retVector <- retVector %>% filter_(filter_criteria)
}
return(retVector)
}