23

Apply function table() to each column of a data.frame using dplyr

I often apply the table-function on each column of a data frame using plyr, like this:

library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) )  )

Is it possible to do this in dplyr also?

My attempts fail:

mtcars %>%  do( table %>% data.frame() )
melt( mtcars ) %>%  do( table %>% data.frame() )
Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79
  • 2
    You could convert this to `long` form using `gather` from `library(tidyr)` and then do `gather(mtcars, Var, Val) %>% group_by(Var) %>% dplyr::mutate(n=n()) %>% group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>% unique()` – akrun Dec 26 '14 at 17:34
  • can you post a full answer using this approach – userJT Jun 07 '19 at 19:30

4 Answers4

13

You can try the following which does not rely on the tidyr package.

mtcars %>% 
   lapply(table) %>% 
   lapply(as.data.frame) %>% 
   Map(cbind,var = names(mtcars),.) %>% 
   rbind_all() %>% 
   group_by(var) %>% 
   mutate(pct = Freq / sum(Freq))
kdauria
  • 6,300
  • 4
  • 34
  • 53
Caner
  • 678
  • 2
  • 8
  • 11
  • 1
    can you elaborate the answer. I am getting some errors due to worse input data.frame and would like to troubleshoot. can I use purrr:map and not `Map` error is `Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 1, 0` – userJT Jun 07 '19 at 19:25
12

Using tidyverse (dplyr and purrr):

library(tidyverse)

mtcars %>%
    map( function(x) table(x) )

Or:

mtcars %>%
    map(~ table(.x) )

Or simply:

library(tidyverse)

mtcars %>%
    map( table )
Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79
11

In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.

> mtcars %>% 
    group_by(cyl) %>% 
    tally()
> # mtcars %>% count(cyl)

Source: local data frame [3 x 2]

  cyl  n
1   4 11
2   6  7
3   8 14

If you want to do a two-way frequency table, group by more than one variable.

> mtcars %>% 
    group_by(gear, cyl) %>% 
    tally()
> # mtcars %>% count(gear, cyl)

You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.

josiekre
  • 795
  • 1
  • 7
  • 19
  • 3
    `mtcars %>% count(cyl)` or `mtcars %>% count(gear, cyl)`. I think the question is how to do this for every variable in one call. – Tunn Dec 20 '16 at 15:07
  • 1
    Fair enough; but I just wanted to point out that usually running this on every single column will result in really, really long output. At least one of the columns is likely to be a unique id variable. I updated my answer to include the use of `count` since it does the `group_by` for you. Thanks! – josiekre Dec 21 '16 at 16:10
0

Solution by Caner did not work but from comenter akrun (credit goes to him), this solution worked great. Also using a much larger tibble to demo it. Also I added an order by percent descending.

library(nycflights13);dim(flights)

tte<-gather(flights, Var, Val) %>% 
group_by(Var) %>% dplyr::mutate(n=n()) %>% 
group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>%
arrange(Var,desc(n1) %>% unique()
userJT
  • 11,486
  • 20
  • 77
  • 88