How to put vectors with different length together to one matrix

Question

I’ve got a data frame like this:

    id                 class  
   146                H02J
   146                F03D
   146                F03D
   287                F16F
   287                F16F
  1040                F03D
  1040                F16D
  1040                F03D
  1042                F03D
  1042                G01W
  1042                F03D
  1042                F03D
  1042                F03D
  1816                G06F
  1816                H04Q
  1816                H04L
  1816                H04W

Now I want to build vectors with numeric values, each vector representing one application and each numeric value representing a class.

Because of different length of the vectors, I cannot define a matrix with the vectors, with my R skills, and I am thankful for ideas to solve this problem.

The output should be a matrix like this, with the goal to determine the distance between the vectors.

> mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    6    1    1   NA   NA
[2,]    3    3   NA   NA   NA
[3,]    1    2    1   NA   NA
[4,]    1    4    1    1    1
[5,]    5    8    7    9   NA

I got this with:

v1 <- subset(num, id==146)
v2 <- subset(num, id==287)
v3 <- subset(num, id==1040)
v4 <- subset(num, id==1042)
v5 <- subset(num, id==1816)

list <- list(c(v1), c(v2), c(v3), c(v4), c(v5))
list
max.length <- max(sapply(list, length))
list <- lapply(list, function(x) { c(x, rep(NA, max.length-length(x)))})
do.call(rbind, list)
mat <- do.call(rbind, list)

but the solution should not only work for this five examples, but for a huge amount of id’s (vectors), without put the numbers of id’s manually.

I can't reproduce this. I don't see this "num" object. It looks like you have only one operation that isn't already in a loop/*apply (and so generalizable to more vectors). Something like `mylist <- lapply(unique(df$ids),function(i)subset(num,id==i))` might work for that. You probably don't want to name your list "list", by the way. — Frank, Feb 17 '15 at 19:19

Sven Hohenstein · Accepted Answer · 2015-02-17T19:59:35.790

3

You can use rbind.fill.matrix from the plyr package:

library(plyr)
do.call(rbind.fill.matrix,  tapply(as.integer(num$class), num$id, t))

The result:

     1 2  3  4  5
[1,] 6 1  1 NA NA
[2,] 3 3 NA NA NA
[3,] 1 2  1 NA NA
[4,] 1 4  1  1  1
[5,] 5 8  7  9 NA

edited Feb 17 '15 at 19:59

answered Feb 17 '15 at 19:21

Sven Hohenstein

80,497
17
145
168

score 1 · Answer 2 · answered Feb 17 '15 at 19:46

With the dplyr and tidyr packages, you can do:

library(dplyr)
library(tidyr)

d %>% 
  group_by(id) %>% 
  mutate(i=1:n(),value=as.integer(class),class=NULL) %>% 
  spread(i,value)

#     id 1 2  3  4  5
# 1  146 6 1  1 NA NA
# 2  287 3 3 NA NA NA
# 3 1040 1 2  1 NA NA
# 4 1042 1 4  1  1  1
# 5 1816 5 8  7  9 NA

where d is the sample data set:

d <- structure(list(id = c(146L, 146L, 146L, 287L, 287L, 1040L, 1040L, 
1040L, 1042L, 1042L, 1042L, 1042L, 1042L, 1816L, 1816L, 1816L, 
1816L), class = structure(c(6L, 1L, 1L, 3L, 3L, 1L, 2L, 1L, 1L, 
4L, 1L, 1L, 1L, 5L, 8L, 7L, 9L), .Label = c("F03D", "F16D", "F16F", 
"G01W", "G06F", "H02J", "H04L", "H04Q", "H04W"), class = "factor")), .Names = c("id", 
"class"), class = "data.frame", row.names = c(NA, -17L))

score 0 · Answer 3 · answered Feb 17 '15 at 19:22

You can use the dcast function from the reshape2 package.

    library(reshape2)

    x <- dcast(num, id ~ class)

    mat <- as.matrix(x[,-1])

You should note that the column names for this matrix are the values found in your class column. Additionally, NAs are represented as 0's which are more appropriate for computing distances.

How to put vectors with different length together to one matrix

3 Answers3