4

dplyr is fast and I would like to use the %.% piping a lot. I want to use a table function (count by frequency) and preserve column name and have output be data.frame.

How can I achieve the same as the code below using only dplyr functions (imagine huge data.table (BIGiris) with 6M rows)

> out<-as.data.frame(table(iris$Species))
> names(out)[1]<-'Species'
> names(out)[2]<-'my_cnt1'
> out

output is this. Notice that I have to rename back column 1. Also, in dplyr mutate or other call - I would like to specify name for my new count column somehow.

     Species my_cnt1
1     setosa      50
2 versicolor      50
3  virginica      50

imagine joining to a table like this (assume iris data.frame has 6M rows) and species is more like "species_ID"

> habitat<-data.frame(Species=c('setosa'),lives_in='sea')

final join and output (for joining, I need to preserve column names all the time)

> left_join(out,habitat)
Joining by: "Species"
     Species my_cnt1 lives_in
1     setosa      50      sea
2 versicolor      50     <NA>
3  virginica      50     <NA>
> 
userJT
  • 11,486
  • 20
  • 77
  • 88

3 Answers3

10

For the first part you can use dplyr like this

library(dplyr)
out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n())
out

Source: local data frame [3 x 2]

     Species my_cnt1
1     setosa      50
2 versicolor      50
3  virginica      50

To continue in one chain do this:

out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n()) %>% left_join(habitat)
out

Source: local data frame [3 x 3]

     Species my_cnt1 lives_in
1     setosa      50      sea
2 versicolor      50       NA
3  virginica      50       NA

By the way, dplyr now uses %>% in place of %.%. It does the same thing and is part of the package magrittr as well.

ecologician
  • 473
  • 3
  • 9
  • 1
    tally() is great! Using tally() though, if you wanted a custom column name I suppose you would have to do this: `out <- iris %>% group_by(Species) %>% tally() %>% select(Species, my_cnt1 = n)`. – ecologician Jun 25 '14 at 23:43
0

count() may be a convenient option to get behavior similar to table():

iris %>% 
  group_by(Species) %>% 
  count(name="my_cnt1")

For table()-like output with two factors:

iris %>% 
  group_by(Species) %>% 
  count(Petal.Width) %>% 
  pivot_wider(names_from = Petal.Width, values_from=n)
Brian D
  • 2,570
  • 1
  • 24
  • 43
-1

Or you can simply attach the dataframe and then run the table function. This will display the column names too.

> attach(iris)
> table(Species)
 Species
    setosa versicolor  virginica 
        50         50         50