1

I’ve got a data frame like this:

    id                 class  
   146                H02J
   146                F03D
   146                F03D
   287                F16F
   287                F16F
  1040                F03D
  1040                F16D
  1040                F03D
  1042                F03D
  1042                G01W
  1042                F03D
  1042                F03D
  1042                F03D
  1816                G06F
  1816                H04Q
  1816                H04L
  1816                H04W

Now I want to build vectors with numeric values, each vector representing one application and each numeric value representing a class.

Because of different length of the vectors, I cannot define a matrix with the vectors, with my R skills, and I am thankful for ideas to solve this problem.

The output should be a matrix like this, with the goal to determine the distance between the vectors.

> mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    6    1    1   NA   NA
[2,]    3    3   NA   NA   NA
[3,]    1    2    1   NA   NA
[4,]    1    4    1    1    1
[5,]    5    8    7    9   NA

I got this with:

v1 <- subset(num, id==146)
v2 <- subset(num, id==287)
v3 <- subset(num, id==1040)
v4 <- subset(num, id==1042)
v5 <- subset(num, id==1816)

list <- list(c(v1), c(v2), c(v3), c(v4), c(v5))
list
max.length <- max(sapply(list, length))
list <- lapply(list, function(x) { c(x, rep(NA, max.length-length(x)))})
do.call(rbind, list)
mat <- do.call(rbind, list)

but the solution should not only work for this five examples, but for a huge amount of id’s (vectors), without put the numbers of id’s manually.

Cœur
  • 37,241
  • 25
  • 195
  • 267
David
  • 13
  • 2
  • I can't reproduce this. I don't see this "num" object. It looks like you have only one operation that isn't already in a loop/*apply (and so generalizable to more vectors). Something like `mylist <- lapply(unique(df$ids),function(i)subset(num,id==i))` might work for that. You probably don't want to name your list "list", by the way. – Frank Feb 17 '15 at 19:19

3 Answers3

3

You can use rbind.fill.matrix from the plyr package:

library(plyr)
do.call(rbind.fill.matrix,  tapply(as.integer(num$class), num$id, t))

The result:

     1 2  3  4  5
[1,] 6 1  1 NA NA
[2,] 3 3 NA NA NA
[3,] 1 2  1 NA NA
[4,] 1 4  1  1  1
[5,] 5 8  7  9 NA
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
1

With the dplyr and tidyr packages, you can do:

library(dplyr)
library(tidyr)

d %>% 
  group_by(id) %>% 
  mutate(i=1:n(),value=as.integer(class),class=NULL) %>% 
  spread(i,value)

#     id 1 2  3  4  5
# 1  146 6 1  1 NA NA
# 2  287 3 3 NA NA NA
# 3 1040 1 2  1 NA NA
# 4 1042 1 4  1  1  1
# 5 1816 5 8  7  9 NA

where d is the sample data set:

d <- structure(list(id = c(146L, 146L, 146L, 287L, 287L, 1040L, 1040L, 
1040L, 1042L, 1042L, 1042L, 1042L, 1042L, 1816L, 1816L, 1816L, 
1816L), class = structure(c(6L, 1L, 1L, 3L, 3L, 1L, 2L, 1L, 1L, 
4L, 1L, 1L, 1L, 5L, 8L, 7L, 9L), .Label = c("F03D", "F16D", "F16F", 
"G01W", "G06F", "H02J", "H04L", "H04Q", "H04W"), class = "factor")), .Names = c("id", 
"class"), class = "data.frame", row.names = c(NA, -17L))
Marat Talipov
  • 13,064
  • 5
  • 34
  • 53
0

You can use the dcast function from the reshape2 package.

    library(reshape2)

    x <- dcast(num, id ~ class)

    mat <- as.matrix(x[,-1])

You should note that the column names for this matrix are the values found in your class column. Additionally, NAs are represented as 0's which are more appropriate for computing distances.

mcastillon
  • 25
  • 3