list all factor levels of a data.frame

Question

with str(data) I get the headof the levels (1-2 values)

fac1: Factor w/ 2  levels ... :
fac2: Factor w/ 5  levels ... :
fac3: Factor w/ 20 levels ... :
val: num ...

with dplyr::glimpse(data) I get more values, but no infos about number/values of factor-levels. Is there an automatic way to get all level informations of all factor vars in a data.frame? A short form with more info for

levels(data$fac1)
levels(data$fac2)
levels(data$fac3)

or more precisely a elegant version for something like

for (n in names(data))
  if (is.factor(data[[n]])) {
    print(n)
    print(levels(data[[n]]))
  }

thx Christof

akrun · Accepted Answer · 2015-08-07T13:59:31.313

33

Here are some options. We loop through the 'data' with sapply and get the levels of each column (assuming that all the columns are factor class)

sapply(data, levels)

Or if we need to pipe (%>%) it, this can be done as

library(dplyr)
data %>% 
     sapply(levels)

Or another option is summarise_each from dplyr where we specify the levels within the funs.

 data %>%
      summarise_each(funs(list(levels(.))))

edited Aug 07 '15 at 13:59

answered Dec 28 '14 at 13:58

akrun

874,273
37
540
662

3

How do we get length of all of those levels – BigDataScientist Jun 23 '16 at 17:23
@BigDataScientist check my answer – Amit Kohli Mar 16 '18 at 11:50

score 9 · Answer 2 · answered Dec 07 '18 at 21:36

9

If your problem is specifically to output a list of all levels for a factor, then I have found a simple solution using :

unique(df$x)

For instance, for the infamous iris dataset:

unique(iris$Species)

answered Dec 07 '18 at 21:36

Djamil Lakhdar-Hamina

111
1
2

score 6 · Answer 3 · edited Mar 31 '21 at 19:14

6

Or using purrr:

data %>% purrr::map(levels)

Or to first factorize everything:

data %>% dplyr::mutate_all(as.factor) %>% purrr::map(levels)

And answering the question about how to get the lengths:

data %>% map(levels) %>% map(length)

edited Mar 31 '21 at 19:14

Paul

3,920
31
29

answered Mar 16 '18 at 11:49

Amit Kohli

2,860
2
24
44

1

Nice approach. I like it. – igorkf Jan 29 '20 at 00:53

score 4 · Answer 4 · edited Jul 27 '16 at 10:18

4

A simpler method is to use the sqldf package and use a select distinct statement. This makes it easier to automatically get the names of factor levels and then specify as levels to other columns/variables.

Generic code snippet is:

library(sqldf)
    array_name = sqldf("select DISTINCT *colname1* as '*column_title*' from *table_name*")

Sample code using iris dataset:

df1 = iris
factor1 <- sqldf("select distinct Species as 'flower_type' from df1")
factor1    ## to print the names of factors

Output:

  flower_type
1      setosa
2  versicolor
3   virginica

edited Jul 27 '16 at 10:18

doncherry

259
3
14

answered Jul 15 '16 at 12:51

Ann Rajaram

61
2

1

If you indent each code line by 4 spaces it will format itself properly. – G. Grothendieck Jul 16 '16 at 03:01

score 2 · Answer 5 · answered Jan 01 '20 at 17:16

2

In case you want to display factor levels only for thos columns which are declared as.factor, you can use:

lapply(df[sapply(df, is.factor)], levels)

answered Jan 01 '20 at 17:16

Peter

2,120
2
19
33

score 0 · Answer 6 · answered Jun 18 '20 at 04:52

0

Alternate option to get length of levels in a 'data'.frame:

data_levels_length <- sapply(seq(1, ncol(data)), function(x){
  length(levels(data[,x]))
})

answered Jun 18 '20 at 04:52

Jay J

1
1

score 0 · Answer 7 · answered Mar 31 '21 at 19:22

As a long data frame (tibble):

df %>% gather(name, value) %>% count(name, value)

This converts all the columns into name-value pairs, and then counts the unique levels.

Subset column types with something like:

df %>% select_if(is.character) %>% ...

Via https://stackoverflow.com/a/47122651/3217870

Driele Delanira · Answer 8 · 2023-02-07T18:13:55.427

library(dplyr) #for all the following

df$factor %>% unique() %>% str()

lists and counts the frequency of levels of a specific variable

count(df,variable)

returns a table with the levels of a specific variable and its frequency. the number of rows will inform how many levels there are for this variable.

count(df,across())

returns a table of all variables levels that co-occur in observations and the frequency of all different combinations

list all factor levels of a data.frame

8 Answers8

Linked