2

I have a dataframe Data:

Data <- data.frame(A=sample(1:7),B=c(5,5,5,6,6,6,6),C=c(1,2,2,3,3,4,5))
  A B C
1 6 5 1
2 7 5 2
3 4 5 2
4 2 6 3
5 1 6 3
6 5 6 4
7 3 6 5    

I am trying to extract the unique values from each of the columns into a data.frame. Each column has a different set and number of unique values.

I am looking for something like:

A  1   2   3   4   5   6   7 
B  5   6   NA  NA  NA  NA  NA
C  1   2   3   4   5   NA  NA

I was able to loop through it and get a list with the information (I tried using a list because they are of different length)

vars <- c('A','B','C')
mylist = vector("list",length(vars))
for(i in 1: length(vars)){
   mylist[[i]] <- c( names(table( Data[ , vars[i] ] )))
}

How can I get the information into a data.frame, ideally without a loop? Thanks!

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Elks
  • 85
  • 1
  • 10
  • 1
    I think `lapply(Data, unique)` would be best – Rich Scriven Jan 18 '15 at 18:10
  • After Richard's suggestion, you can have a look [here](https://stackoverflow.com/questions/27598702/converting-unsymmetric-vector-list-into-matrix) if you want to convert to matrix/data.frame. – talat Jan 18 '15 at 18:15

2 Answers2

2

lapply() is sufficient for this. Here's the trick I use.

xx <- lapply(Data, unique)
data.frame(do.call(rbind, lapply(xx, "length<-", max(vapply(xx, length, 1L)))))
#   X1 X2 X3 X4 X5 X6 X7
# A  2  3  6  5  1  7  4
# B  5  6 NA NA NA NA NA
# C  1  2  3  4  5 NA NA

First, we iterate over the columns of Data to find all unique values. Then we iterate that, using length<- to extend the length of each element to the length of xx's longest element. Then we just bring it all together into a data frame.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • 2
    Also, similarly, `data.frame(do.call(rbind, lapply(xx, "[", seq_len(nrow(Data)))))` – alexis_laz Jan 18 '15 at 18:21
  • 1
    @alexis_laz - only if the longest unique element is the same as `nrow(Data)`. Otherwise you'll pad all elements with extra NAs – Rich Scriven Jan 18 '15 at 18:23
  • @alexis_laz - An example of unwanted NA's because `nrow(df)` is not the same, `df <- data.frame(x = c(1,2,1,3), y = c(1,2,1,2)); xx <- lapply(df, unique); lapply(xx, "[", seq_len(nrow(df)))` – Rich Scriven Jan 18 '15 at 18:30
2

Here's a possible data.table solution

library(data.table)
data.frame(t(setDT(Data)[, lapply(.SD, function(x) {
                                  temp <- unique(x)
                                  c(sort(temp), 
                                  rep(NA, length(x) - length(temp)))
                                  })]))

#   X1 X2 X3 X4 X5 X6 X7
# A  1  2  3  4  5  6  7
# B  5  6 NA NA NA NA NA
# C  1  2  3  4  5 NA NA
David Arenburg
  • 91,361
  • 17
  • 137
  • 196