4

R often understands data frame columns in a "wrong" format or you just have to change the column class from factor to character in order to modify it. I have been changing the column class in following way previously:

set.seed(1)

df <- data.frame(x = 1:10,
y = rep(1:2, 5),
k = rnorm(10, 5,2),
z = rep(c(2010, 2012, 2011, 2010, 1999), 2),
j = c(rep(c("a", "b", "c"), 3), "d"))

x <- c("y", "z")

for(i in 1:length(x)){
df[,x[i]] <- factor(df[,x[i]])}

And back to numeric:

x <- 1:5

for(i in 1:length(x)){
df[,x[i]] <- as.numeric(as.character(df[,x[i]]))} # Character cannot become numeric

It occurred to me that maybe there is a better way doing this. I found this question, which is almost exactly what I need:

convert.magic <- function(obj,types){
out <- lapply(1:length(obj),FUN = function(i){FUN1 <- 
switch(types[i],
character = as.character,
numeric = as.numeric,
factor = as.factor); FUN1(obj[,i])})
names(out) <- colnames(obj)
as.data.frame(out)
}

However, for this function vector type has to be specified for each column:

convert.magic(df, rep("factor",5))

convert.magic(df, c("character", "factor"))
# Error in FUN(1:5[[1L]], ...) : could not find function "FUN1"

Could somebody help me and rebuild this function so that it works with column names and numbers, please? I am afraid that this would be too advanced for me...

x <- c("y", "z")
convert.magic(df, "character", x)
Community
  • 1
  • 1
Mikko
  • 7,530
  • 8
  • 55
  • 92
  • 3
    If you're only converting factors to numeric, from `?factor`: "To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f))." That also indicates that `convert.magic` might produce unexpected results in certain circumstances. – BenBarnes Jun 29 '12 at 12:10

1 Answers1

6
df <- data.frame(x = 1:10,
                 y = rep(1:2, 5),
                 k = rnorm(10, 5,2),
                 z = rep(c(2010, 2012, 2011, 2010, 1999), 2),
                 j = c(rep(c("a", "b", "c"), 3), "d"))

convert.magic <- function(obj, type){
  FUN1 <- switch(type,
                 character = as.character,
                 numeric = as.numeric,
                 factor = as.factor)
  out <- lapply(obj, FUN1)
  as.data.frame(out)
}

str(df)
str(convert.magic(df, "character"))
str(convert.magic(df, "factor"))
df[, c("x", "y")] <- convert.magic(df[, c("x", "y")], "factor")
Thierry
  • 18,049
  • 5
  • 48
  • 66
  • 3
    This converts the entire data.frame. A slight modification is closer what I was after: `convert.magic <- function(obj, type, cols){ FUN1 <- switch(type, character = as.character, numeric = as.numeric, factor = as.factor) obj[,cols] <- lapply(obj[,cols], FUN1) as.data.frame(obj) }` How to add BenBarnes comment (`as.numeric(levels(f))[f]`) in this function? – Mikko Jun 29 '12 at 16:09
  • 1
    @Largh Instead of using `as.numeric` in the `switch` statement you would probably write a simple wrapper that checks whether its input is a factor or not. If it is, use Ben's method, otherwise just use `as.numeric`. – joran Jun 29 '12 at 16:27