6

I have a nested list that contains country names. I want to count the frequency of the countries, whereby +1 is added with each mention in a sub-list (regardless of how often the country is mentioned in that sub-list).

For instance, if I have this list:

[[1]]
[1] "Austria" "Austria" "Austria"

[[2]]
[1] "Austria" "Sweden"

[[3]]
[1] "Austria" "Austria" "Sweden"  "Sweden" "Sweden" "Sweden"

[[4]]
[1] "Austria" "Austria" "Austria"

[[5]]
[1] "Austria" "Japan" 

... then I would like the result to be like this:

country        freq
====================
Austria         5
Sweden          2
Japan           1

I have tried various ways with lapply, unlist, table, etc. but nothing worked the way I would need it. I would appreciate your help!

anpami
  • 760
  • 5
  • 17

4 Answers4

6

One way with lapply(), unlist() and table():

count <- table(unlist(lapply(lst, unique)))
count
# Austria   Japan  Sweden 
#       5       1       2 


as.data.frame(count)
#      Var1 Freq
# 1 Austria    5
# 2   Japan    1
# 3  Sweden    2

Reproducible data (please provide yourself next time):

lst <- list(
  c('Austria', 'Austria', 'Austria'), 
  c("Austria", "Sweden"), 
  c("Austria", "Austria", "Sweden", "Sweden", "Sweden", "Sweden"), 
  c("Austria", "Austria", "Austria"), 
  c("Austria", "Japan")
)
s_baldur
  • 29,441
  • 4
  • 36
  • 69
3

Here is another base R option

colSums(
  do.call(
    rbind,
    lapply(
      lst,
      function(x) table(factor(x, levels = unique(unlist(lst)))) > 0
    )
  )
)

which gives

Austria  Sweden   Japan
      5       2       1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
2

One way would be to get data in dataframe format and count unique elements where each country occurs.

library(dplyr)

tibble::enframe(lst) %>%
  tidyr::unnest(value) %>%
  group_by(value) %>%
  summarise(freq = n_distinct(name))


# value    freq
#  <chr>   <int>
#1 Austria     5
#2 Japan       1
#3 Sweden      2

data

lst <- list(c('Austria', 'Austria', 'Austria'), c("Austria", "Sweden"), 
     c("Austria", "Austria", "Sweden",  "Sweden", "Sweden", "Sweden"), 
     c("Austria", "Austria", "Austria"), c("Austria", "Japan" ))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

An option is also to stack into a two column data.frame, then take the unique and apply the table

table(unique(stack(setNames(lst, seq_along(lst))))$values)

#   Austria   Japan  Sweden 
#     5       1       2 
akrun
  • 874,273
  • 37
  • 540
  • 662