1

I use the following code (LINK) to clean potentially troublesome aspects of the data for a hypothetical df called dataframe:

dataframe <- fread(
    "A   B  B.x  C  D   E   iso   year   
     0   3   NA  1  NA  NA  NLD   2009   
     1   4   NA  2  NA  NA  NLD   2009   
     0   5   NA  3  NA  NA  AUS   2011   
     1   5   NA  4  NA  NA  AUS   2011   
     0   0   NA  7  NA  NA  NLD   2008   
     1   1   NA  1  NA  NA  NLD   2008   
     0   1   NA  3  NA  NA  AUS   2012   
     0   NA  1   NA  1  NA  ECU   2009   
     1   NA  0   NA  2  0   ECU   2009   
     0   NA  0   NA  3  0   BRA   2011   
     1   NA  0   NA  4  0   BRA   2011   
     0   NA  1   NA  7  NA  ECU   2008   
     1   NA  0   NA  1  0   ECU   2008   
     0   NA  0   NA  3  2   BRA   2012   
     1   NA  0   NA  4  NA  BRA   2012",
   header = TRUE
)

dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
  stop("matrix variables with 'AsIs' class must be 'numeric'")
  }
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)

As I use it a lot I would however prefer to put it in a function and tried the following:

cleanfunction <- function(dataframe) {
dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
  stop("matrix variables with 'AsIs' class must be 'numeric'")
  }
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
}

dfclean <- cleanfunction(dataframe)

This however created a list of variables converted to factors instead of a dataframe which has these variables converted to factors.

How can I solve this?

Tom
  • 2,173
  • 1
  • 17
  • 44

1 Answers1

2

Functions return the value from the last expression evaluated. In this case the last expression evauluated is

dataframe[ind1] <- lapply(dataframe[ind1], as.factor)

and the <- operation always just returns the right hand side value. So you are just returning the results from the lapply, not the updated dataframe.

You just need to add another line that says

return(dataframe)

or just

dataframe

to the end of your function.

MrFlick
  • 195,160
  • 17
  • 277
  • 295