count the occurrence of categorical variables in R

Question

I have a data frame which consists of three categorical variables, and I want to find the frequency of each combination and sort the result by the frequency in descending order as follow:

my data:

   A LEVEL1 PASS
   A LEVEL1 FAIL
   B LEVEL2 PASS
   A LEVEL1 PASS
   B LEVEL2 PASS
   A LEVEL1 PASS

the result should be as follow :

   A LEVEL1 PASS 3
   B LEVEL2 PASS 2
   A LEVEL1 FAIL 1

I use plyr library,

  myfreq<-count(myresult,vars = NULL, wt_var = NULL) 
  myfreq<-myfreq[order-myfreq$freq,]

In the beginning, it works, but then it just gives me this error:

Error in grouped_df_impl(data, unname(vars), drop) : Column vars is unknown

the other libraries I used are rJava and dplyr

thanks

score 4 · Accepted Answer · answered Nov 21 '17 at 10:03

I would suggest using dplyr, which is contained in the tidyverse package.

I don't know what's the name of the columns in your dataframe, so I named them col1, col2 and col3 in the following example.

library(tidyverse)

df <- tribble(
  ~ col1, ~col2, ~col3,
  "A", "LEVEL1", "PASS",
  "A", "LEVEL1", "FAIL",
  "A", "LEVEL1", "PASS",
  "B", "LEVEL2", "PASS",
  "A", "LEVEL1", "PASS")

# here is where the magic happens
df %>% count(col1, col2, col3, sort = TRUE)

score 2 · Answer 2 · answered Nov 21 '17 at 10:05

2

You can use group_by in dplyr:

library(dplyr)


x <- data.frame(letter = c("A", "A", "B", "A", "B", "A"), level = c("LEVEL 1", "LEVEL 1", "LEVEL 2", "LEVEL 1", "LEVEL 2", "LEVEL 1"), text = c("PASS", "FAIL", "PASS", "PASS", "PASS", "PASS"))

df <- x %>%
     group_by_all() %>%
     count()

or you can do:

df <- x %>%
     group_by(letter, level, text) %>%
     count()

output:

> df <- x %>% group_by_all() %>% count()
> df
# A tibble: 3 x 4
# Groups:   x, y, z [3]
       x       y      z     n
  <fctr>  <fctr> <fctr> <int>
1      A LEVEL 1   FAIL     1
2      A LEVEL 1   PASS     3
3      B LEVEL 2   PASS     2

answered Nov 21 '17 at 10:05

Matt W.

3,692
2
23
46

why would you use `group_by` before `count`? – D Pinto Nov 21 '17 at 10:10
1

I suppose in the event there are other columns in the df. But I guess you're right, count is all that is needed. – Matt W. Nov 21 '17 at 10:11

score 1 · Answer 3 · answered Nov 21 '17 at 10:02

1

You can use the table function.

ex <- data.frame("letter" = c("A", "A", "B", "A", "B", "A"),
                 "level" = c("LEVEL1", "LEVEL1", "LEVEL2", "LEVEL1", "LEVEL2", "LEVEL1"),
                 "test" = c("PASS", "FAIL", rep("PASS", 4)))


ex

res <- data.frame(table(ex$level, ex$test))
colnames(res) <- c("level", "test", "freq")

You can later merge the result data.frame with the original one.

answered Nov 21 '17 at 10:02

P. Denelle

790
10
24

thank you all, am just wondering is there conflict between dplyr and plyr? – Manal Nov 21 '17 at 10:22

score 1 · Answer 4 · edited Nov 21 '17 at 10:15

Here is tidyverse with n()

df <- tibble(
  id = c("A", "A", "B", "A", "B", "A"),
  level = c("LEVEL1", "LEVEL1", "LEVEL2", "LEVEL1", "LEVEL2", "LEVEL1"),
  type = factor(c("PASS", "FAIL", "PASS", "PASS", "PASS", "PASS"))
)

df %>% 
  group_by(id, level, type) %>%
  summarise(n = n()) %>%
  arrange(desc(n))

# A tibble: 3 x 4
# Groups:   id, level [?]
     id  level   type     n
  <chr>  <chr> <fctr> <int>
1     A LEVEL1   FAIL     1
2     A LEVEL1   PASS     3
3     B LEVEL2   PASS     2

count the occurrence of categorical variables in R

4 Answers4