5

I have a vector, well a data frame with only one column, that contains lists of uneven lengths:

data = list(
c(349, 364, 393, 356, 357, 394, 334, 394, 343, 365, 349),
c(390, 336, 752, 377),
c(670, 757, 405, 343, 1109, 350, 372),
c(0, 0),
c(),
c(1115, 394, 327, 356, 408, 329, 385, 357, 357))

and I would like to convert it to a matrix, filling the gaps with NA. So it should look something like this:

349, 364, 393, 356, 357, 394, 334, 394, 343, 365, 349
390, 336, 752, 377, NA,  NA,  NA,  NA,  NA,  NA,  NA
670, 757, 405, 343, 1109,350, 372, NA,  NA,  NA,  NA
0,   0,   NA,  NA,  NA,  NA,  NA,  NA,  NA,  NA,  NA
NA,  NA,  NA,  NA,  NA,  NA,  NA,  NA,  NA,  NA,  NA                     
1115,394, 327, 356, 408, 329, 385, 357, 357, NA,  NA 

eventually, even to get rid of the rows with only NAs. I have tried

data = sapply(data[,1], FUN=unlist)

and then

data = sapply(data, '[', seq(max(sapply(data, length))))

but I keep getting a matrix with all the elements unlisted in one column. Please advise.

Marius
  • 990
  • 1
  • 14
  • 34
  • 1
    Your `data` is giving errors. Please check for typo. Did you meant `list` instead of `as.vector` – akrun Aug 31 '16 at 10:54
  • 1
    Yeah, sorry. Corrected now. It is in fact a data frame with one column resulting from the `aggregate` function – Marius Aug 31 '16 at 11:18
  • Probably, you used the `list` as `FUN` in `aggregate` to `list` the `unique` elements. – akrun Aug 31 '16 at 11:20
  • No, I actually used `FUN=diff` in `aggregate` in order to get time differences between consecutive rows having the same identifier – Marius Aug 31 '16 at 11:21
  • Possible duplicate of [R: convert asymmetric list to matrix - number of elements in each sub-list differ](http://stackoverflow.com/questions/11148429/r-convert-asymmetric-list-to-matrix-number-of-elements-in-each-sub-list-diffe) – Ronak Shah Aug 31 '16 at 11:30

1 Answers1

7

I guess the 'data' should be a list instead of a vector, then the code would work

t(sapply(data, `length<-`, max(lengths(data))))

NOTE: lengths is a faster option (introduced in the recent R versions) that replaces sapply(data, length)

data

data = list(
  c(349, 364, 393, 356, 357, 394, 334, 394, 343, 365, 349),
  c(390, 336, 752, 377),
  c(670, 757, 405, 343, 1109, 350, 372),
  c(0, 0),
  numeric(0),
  c(1115, 394, 327, 356, 408, 329, 385, 357, 357))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you. It works. In my case, being a resulting `data.frame` with only one column, coming from the `aggregate` function, I had to do `t(sapply(data[,1], \`length<-\`, max(lengths(data[,1]))))` Of course it can always be coerced. Thanks once again! – Marius Aug 31 '16 at 11:20