1

I have items in different lists and I want to count the item in each list and output it to a table. However, I ran into difficulty when there are different items in the list. Too illustrate my problem:

item_1 <- c("A","A","B")
item_2 <- c("A","B","B","B","C")
item_3 <- c("C","A")
item_4 <- c("D","A", "A")
item_5 <- c("B","D")


list_1 <- list(item_1, item_2, item_3)
list_2 <- list(item_4, item_5)

table_1 <- table(unlist(list_1))
table_2 <- table(unlist(list_2))

> table_1

A B C 
4 4 2 
> table_2

A B D 
2 1 2 

What I get from cbind is :

> cbind(table_1, table_2)

  table_1 table_2
A       4       2
B       4       1
C       2       2

which is clearly wrong. What I need is:

  table_1 table_2
A       4       2
B       4       1
C       2       0
D       0       2

Thanks in advance

Roland
  • 127,288
  • 10
  • 191
  • 288
Adrien
  • 151
  • 1
  • 8
  • You could make your character vectors factors (with levels that include all possible values). – Roland Sep 03 '14 at 08:19

3 Answers3

3

It would probably be better to use factors at the start if possible, something like:

L <- list(list_1 = list_1, 
          list_2 = list_2)
RN <- unique(unlist(L))
do.call(cbind, 
        lapply(L, function(x)
          table(factor(unlist(x), RN))))
#   list_1 list_2
# A      4      2
# B      4      1
# C      2      0
# D      0      2

However, going with what you have, a function like the following might be useful. I've added comments to help explain what's happening in each step.

myFun <- function(..., fill = 0) {
  ## Get the names of the ...s. These will be our column names
  CN <- sapply(substitute(list(...))[-1], deparse)
  ## Put the ...s into a list
  Lst <- setNames(list(...), CN)
  ## Get the relevant row names
  RN <- unique(unlist(lapply(Lst, names), use.names = FALSE))
  ## Create an empty matrix. `fill` can be anything--it's set to 0
  M <- matrix(fill, length(RN), length(CN),
              dimnames = list(RN, CN))
  ## Use match to identify the correct row to fill in
  Row <- lapply(Lst, function(x) match(names(x), RN))
  ## use matrix indexing to fill in the unlisted values of Lst
  M[cbind(unlist(Row), 
          rep(seq_along(Lst), vapply(Row, length, 1L)))] <-
    unlist(Lst, use.names = FALSE)
  ## Return your matrix
  M
}

Applied to your two tables, the outcome is like this:

myFun(table_1, table_2)
#   table_1 table_2
# A       4       2
# B       4       1
# C       2       0
# D       0       2

Here's an example with adding another table to the problem. It also demonstrates use of NA as a fill value.

set.seed(1) ## So you can get the same results as me
table_3 <- table(sample(LETTERS[3:6], 20, TRUE) )
table_3
# 
# C D E F 
# 2 7 9 2

myFun(table_1, table_2, table_3, fill = NA)
#   table_1 table_2 table_3
# A       4       2      NA
# B       4       1      NA
# C       2      NA       2
# D      NA       2       7
# E      NA      NA       9
# F      NA      NA       2
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

To fix your existing problem, you can put the two tables into a list and add the missing values an names back in. Here, nm is a vector of the table names unique to each table, tbs is a list of the tables, and we can use sapply to append and reorder the missing values.

> nm <- unique(unlist(mget(paste("item", 1:5, sep = "_"))))
> tbs <- list(t1 = table_1, t2 = table_2)
> sapply(tbs, function(x) {
      x[4] <- 0L
      names(x)[4] <- nm[!nm %in% names(x)]
      x[nm]
  })
  t1 t2
A  4  2
B  4  1
C  2  0
D  0  2

A general solution, for when you have unknowns, and so that you can keep NA values, is

> sapply(tbs, function(x) {
      length(x) <- length(nm)
      x <- x[match(nm, names(x))]
      setNames(x, nm)
  })
  t1 t2
A  4  2
B  4  1
C  2 NA
D NA  2

But you could have avoided this entirely by going straight from items to table. You put the items into a list and then unlisted them in the very next step. There is a useNA argument in table that will keep the factor levels even when they're zero.

> t1 <- table(c(item_1, item_2, item_3), useNA = "always")
> t2 <- table(c(item_4, item_5), useNA = "always")
> table(c(item_4, item_5), useNA = "always")

   A    B    D <NA> 
   2    1    2    0 
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • This is not very easily extendable though. What if the second table were like "table_3" in my answer? – A5C1D2H2I1M1N2O1R2T1 Sep 03 '14 at 10:10
  • I can't even find `table_3` in your answer. The problem happened in step 1, and I was just illustrating how to avoid it. And did you notice OP went vector > list > unlist > table > cbind ? Table the vectors and it's done. – Rich Scriven Sep 03 '14 at 10:15
  • "table_3" is towards the end of my answer, to demonstrate further features of the function. How would you extend your answer if, for instance, "item_5" was `c("B","D","X")`? – A5C1D2H2I1M1N2O1R2T1 Sep 03 '14 at 10:23
  • Why are you asking that? It wasn't posed in the question. I would match the names. – Rich Scriven Sep 03 '14 at 10:26
  • 1
    Because the best questions and answers on SO tend to be those which are more generally applicable rather than those which only solve the OP's specific question. – A5C1D2H2I1M1N2O1R2T1 Sep 03 '14 at 10:28
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/60525/discussion-between-ananda-mahto-and-richard-scriven). – A5C1D2H2I1M1N2O1R2T1 Sep 03 '14 at 10:31
0

A quick fix to your problem is to make the tables into data frames and then merge them:

    d1 <- data.frame(value=names(table_1), table_1=as.numeric(table_1))
    d2 <- data.frame(value=names(table_2), table_2=as.numeric(table_2))
    merge(d1,d2, all=TRUE)

This will create NA's where you might want 0's. That can be fixed with

    M <- merge(d1,d2, all=TRUE) 
    M[is.na(M)] <- 0