0

I want to droplevels a dataframe (please do not mark this question as duplicate :)). Given all the methods available only one works. What am I doing wrong? Example:

> df = data.frame(x = (c("a","b","c")),y=c("d","e","f"))
> class(df$x)
[1] "factor"
> levels(df$x)
[1] "a" "b" "c"

Method 1 not working:

> df1 = droplevels(df)
> class(df1$x)
[1] "factor"
> levels(df1$x)
[1] "a" "b" "c"

Method 2 not working:

> df2 = as.data.frame(df, stringsAsFactors = FALSE) 
> class(df2$x)
[1] "factor"
> levels(df2$x)
[1] "a" "b" "c"

Method 3 not working:

> df3 = df
> df3$x = factor(df3$x) 
> class(df3$x)
[1] "factor"
> levels(df3$x)
[1] "a" "b" "c"

Method 4 finally works:

> df4 = df
> df4$x = as.vector(df4$x)
> class(df4$x)
[1] "character"
> levels(df4$x)
NULL

While working, I think method 4 is the least elegant. Can you help me to debug this? Many thanks

EDIT: Following comments and answers: I want to remove the factor structure from a data frame and not only droplevels

IRTFM
  • 258,963
  • 21
  • 364
  • 487
MasterJedi
  • 1,618
  • 1
  • 18
  • 17
  • 4
    So when you say you want to `droplevels` you really just mean you want to convert the factor variable a character varaible. If so, Method 4 is the only systematically correct choice. `droplevels` removes unobserved levels from a factor, but in your test case, you observe all levels so nothing round be dropped. If you don't want them to be factors in the first place, use `df = data.frame(x = (c("a","b","c")),y=c("d","e","f"), stringsAsFactors=FALSE)`. Method 2 does not work because they are already factors at that point. What *exactly* is your goal? – MrFlick Sep 22 '14 at 17:26
  • @MrFlick, thanks for explanation, however still strange that method 2 does not work – MasterJedi Sep 22 '14 at 17:31
  • @YujiaHu Not strange at all. If you pass `as.data.frame` a data.frame all it does is adjust the class attribute and (possibly) the row names. – joran Sep 22 '14 at 17:34
  • Like I said, Method 2 does not work because `df` already has factors when it was created. The `stringsAsFactors=` parameter only affects character vectors, not vectors that are already factor. – MrFlick Sep 22 '14 at 17:34

2 Answers2

4

I'm guessing you want:

df[] <- lapply(df, as.character)

This has two differences from your code: the "[]" on the LHS of the assignment which preserves the dataframe structure of dfand the use of lapply. The droplevels function only drops extraneous levels but does not convert to a character vector. The as.character function does not have a data.frame method. It needs to be (l)-applied to each of the factor vectors rather than to a list of factor vectors. The more general function to do that (avoiding the error of attempting coercion on a numeric vector) would be:

 makefac2char <- function(v) if(is.factor(v)){as.character(v)} else {v}
 df[] <- lapply(df, makefac2char)
 # To make a new dataframe
 df2 <- lapply(df, makefac2char)
 df2<- data.frame(df2)

If you do not want to destructively replace 'df' then you need to wrap data.frame around the lapply results since lapply does not maintain attributes. If you had created that dataframe with 'stringAsFactors=FALSE' (or set that option in .Options) you would not have needed to do this on a data.frame-wide basis.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    The standard `droplevels()` seems to behave just fine over a data.frame `df = data.frame(x = factor(c("a","b","c"), levels=letters),y=c("d","e","f"), z=1:3); droplevels(df)`. Not sure what this accomplishes. – MrFlick Sep 22 '14 at 17:32
  • Pretty sure `droplevels` does have a data.frame method. – joran Sep 22 '14 at 17:32
  • Sorry. The questioner is confused (and confused me) about what the action of `droplevels`. Edited to give him what he wants, but is using the wrong function to achieve. He wants `as.character`. – IRTFM Sep 22 '14 at 17:37
4

"Dropping levels" refers to getting rid of unused factor levels, but keeping the object as class factor. You're looking for a way to convert all factor columns into character columns:

> df2 = data.frame(lapply(df, 
           function(x) if (is.factor(x)) as.character(x) else x), 
              stringsAsFactors = FALSE)
> lapply(df2, class)
$x
[1] "character"

$y
[1] "character"

> df2
  x y
1 a d
2 b e
3 c f
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Señor O
  • 17,049
  • 2
  • 45
  • 47