The base::levels
help file https://stat.ethz.ch/R-manual/R-devel/library/base/html/levels.html contains the following example of modifying the levels of a variable:
z <- gl(3, 2, 12, labels = c("apple", "salad", "orange"))
z
levels(z) <- c("fruit", "veg", "fruit")
z
Suppose that this stuff sits inside a data frame:
mydata <- data.frame(z=gl(3, 2, 12, labels = c("apple", "salad", "orange")), n=1:12)
I want to write a function that does the conversion of the levels which takes the data frame and the variable name as inputs:
modify_levels <- function(df,varname,from,to) {
### MAGIC HAPPENS
}
so that modify_levels(mydata,z,from=c("apple","orange"),to="fruit")
does a part of the transformation (and modify_levels(mydata,z,from=c("salad","broccoli"),to="veg")
does the second part, even though the level broccoli
may not exist in my data set).
With some non-standard evaluation voodoo I can zoom down onto what I need to modify:
where_are_levels <- function(df,varname,from,to,verbose=FALSE) {
# input checks
if ( !is.data.frame(df) ) {
stop("df is not a data frame")
}
if ( !is.factor(eval(substitute(varname),df)) ) {
stop("df$varname is not a factor")
}
if (verbose==TRUE) {
cat("df$varname is",
paste0(substitute(df),"$",substitute(varname)))
cat(" which evaluates to:\n")
print( eval(substitute(varname),df) )
}
if (length(to)!=1) {
stop("Substitution is ambiguous")
}
# figure out what the cases are with the supplied source values
for (val in from) {
r <- (eval(substitute(varname),df) == val)
if (verbose==TRUE) {
print(r)
cat( paste0(substitute(df),"$",substitute(varname)),"==",val)
cat(": ",sum(r), "case(s)\n")
}
}
}
So far, so good (the to
option does nothing though):
> where_are_levels(mydata,z,from=c("apple","orange"),to="",verbose=TRUE)
## df$varname is mydata$z which evaluates to:
## [1] apple apple salad salad orange orange apple apple salad salad orange orange
## Levels: apple salad orange
## [1] TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
## mydata$z == apple: 4 case(s)
## [1] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
## mydata$z == orange: 4 case(s)
Now, for the next step, what I think I need to do is to append the levels of the target variable with an additional level, and change the values of that variable. In interactive work, I would
# to <- "fruit" # passed as a function argument
l1 <- levels(mydata$z)
levels(mydata$z) <- union(l1,to)
mydata[r,"z"] <- to
of which I can only get the first line programmatically within the val
cycle:
l1 <- levels(eval(substitute(varname),df))
which would happen inside the val
cycle.
Note that I want to keep the existing levels of apples and oranges rather than just change the whole thing around (as was done in the overhaul example in the help file).
If the solution is easier to achieve via dplyr
programming from scratch, that is fine with me (although my understanding is that the NSE with it is even more hardcore in dplyr
than in base R).