2

I have this matrix

> matrix(letters[1:10],2)
     [,1] [,2] [,3] [,4] [,5]
[1,] "a"  "c"  "e"  "g"  "i" 
[2,] "b"  "d"  "f"  "h"  "j" 

And I would like to have this output

    [,1] 
[1,] "acegi" 
[2,] "bdfhj" 

Thus basically a vector.How can I to that? I was trying something like apply(matrix(letters[1:10],2),2,paste0) but it does not work.

Dambo
  • 3,318
  • 5
  • 30
  • 79

2 Answers2

7

This seems to do what you want:

matrix(apply(m, 1, function(x) paste(x, collapse = '')))
    [,1]   
[1,] "acegi"
[2,] "bdfhj"
Gopala
  • 10,363
  • 7
  • 45
  • 77
6

Another option is to convert to data.frame and use do.call with paste

matrix(do.call(paste0, as.data.frame(m1)))
#     [,1]   
#[1,] "acegi"
#[2,] "bdfhj"

NOTE: It is faster than looping through each row.

Benchmarks

set.seed(24)
m2 <- matrix(sample(letters, 1e7*4, replace=TRUE), ncol=4)
system.time(matrix(apply(m2, 1, paste, collapse="")))
#   user  system elapsed 
#  75.81    0.27   76.44 

system.time(matrix(do.call(paste0, as.data.frame(m2))))
#  user  system elapsed 
#   9.62    0.14    9.76 

Using @Frank's variation

system.time(matrix(do.call(paste0, split(m2, col(m2)))))
#  user  system elapsed 
#  9.54    0.19    9.75 

As @PierreLafortune wanted to check with a dataset with more number of columns,

set.seed(49)
m2 <- matrix(sample(letters, 1e6*10, replace=TRUE), ncol=10)
system.time(matrix(apply(m2, 1, paste, collapse="")))
#  user  system elapsed 
#  8.90    0.00    8.89 
system.time(matrix(do.call(paste0, as.data.frame(m2))))
#  user  system elapsed 
#  1.92    0.00    1.92 

If the number of rows and columns are same, say 5000 x 5000, then

set.seed(37)
m2 <- matrix(sample(letters, 5000*5000, replace=TRUE), ncol=1000)
system.time(matrix(apply(m2, 1, paste, collapse="")))
#   user  system elapsed 
#  5.42    0.00    5.42 
system.time(matrix(do.call(paste0, as.data.frame(m2))))
#  user  system elapsed 
#  7.42    0.00    7.43 
system.time({n = nrow(m2)
     do.call(paste0, lapply(seq_len(ncol(m2)),
       function(j) m2[seq(to=j*n, length.out=n)]))})
#  user  system elapsed 
#  6.19    0.00    6.20 

the apply method is slightly faster, but I assume that there will be more rows than columns in the OP's dataset.

Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • :) No, upvoted and then I'm like "Wow, can that really be? Lemme check". Still finding it kind of weird. It even does as well as `do.call(paste0, split(m2, col(m2)))` – Frank Apr 21 '16 at 02:44
  • 1
    Could it be the 4 columns? Try with more cols to see the data frame overhead maybe – Pierre L Apr 21 '16 at 02:46
  • I guess the minimum is around `system.time(paste0(m2[,1], m2[,2], m2[,3], m2[,4]))` and you're only 50% above that. It could probably be achieved with eval parse stuff (but I wouldn't do that). – Frank Apr 21 '16 at 02:48
  • Thanks for the exhaustive answer. For what is worth, the fact that your solution was faster was quite counterintuitive for me. – Dambo Apr 21 '16 at 02:55
  • 1
    On my system, this is a little faster: `system.time({n = nrow(m2); do.call(paste0, lapply(seq_len(ncol(m2)), function(j) m2[seq(to=j*n, length.out=n)]))})`. Certainly more convoluted, though. – Frank Apr 21 '16 at 03:02