How can I collapse all the columns in a matrix to its first column?

Question

I have this matrix

> matrix(letters[1:10],2)
     [,1] [,2] [,3] [,4] [,5]
[1,] "a"  "c"  "e"  "g"  "i" 
[2,] "b"  "d"  "f"  "h"  "j"

And I would like to have this output

    [,1] 
[1,] "acegi" 
[2,] "bdfhj"

Thus basically a vector.How can I to that? I was trying something like apply(matrix(letters[1:10],2),2,paste0) but it does not work.

apply with 1 in the second argument instead of 2? – Frank Apr 21 '16 at 01:56 — Frank, Apr 21 '16 at 01:56
Nope that just apply the function row-wise. – Dambo Apr 21 '16 at 01:57 — Dambo, Apr 21 '16 at 01:57

Gopala · Answer 1 · 2016-04-21T02:15:29.050

7

This seems to do what you want:

matrix(apply(m, 1, function(x) paste(x, collapse = '')))
    [,1]   
[1,] "acegi"
[2,] "bdfhj"

edited Apr 21 '16 at 02:15

answered Apr 21 '16 at 02:03

Gopala

10,363
7
45
77

8

No need for the anonymous function, `apply(m, 1, paste, collapse = "")` – Rich Scriven Apr 21 '16 at 02:10
Yeah, just a habit....bad one perhaps. :) – Gopala Apr 21 '16 at 02:11
That collapses column wise though. If that is what you want, no problem. – Gopala Apr 21 '16 at 02:13
Edited...It is ignored. – Gopala Apr 21 '16 at 02:16

score 6 · Accepted Answer · edited Jun 20 '20 at 09:12

Another option is to convert to data.frame and use do.call with paste

matrix(do.call(paste0, as.data.frame(m1)))
#     [,1]   
#[1,] "acegi"
#[2,] "bdfhj"

NOTE: It is faster than looping through each row.

Benchmarks

set.seed(24)
m2 <- matrix(sample(letters, 1e7*4, replace=TRUE), ncol=4)
system.time(matrix(apply(m2, 1, paste, collapse="")))
#   user  system elapsed 
#  75.81    0.27   76.44 

system.time(matrix(do.call(paste0, as.data.frame(m2))))
#  user  system elapsed 
#   9.62    0.14    9.76

Using @Frank's variation

system.time(matrix(do.call(paste0, split(m2, col(m2)))))
#  user  system elapsed 
#  9.54    0.19    9.75

As @PierreLafortune wanted to check with a dataset with more number of columns,

set.seed(49)
m2 <- matrix(sample(letters, 1e6*10, replace=TRUE), ncol=10)
system.time(matrix(apply(m2, 1, paste, collapse="")))
#  user  system elapsed 
#  8.90    0.00    8.89 
system.time(matrix(do.call(paste0, as.data.frame(m2))))
#  user  system elapsed 
#  1.92    0.00    1.92

If the number of rows and columns are same, say 5000 x 5000, then

set.seed(37)
m2 <- matrix(sample(letters, 5000*5000, replace=TRUE), ncol=1000)
system.time(matrix(apply(m2, 1, paste, collapse="")))
#   user  system elapsed 
#  5.42    0.00    5.42 
system.time(matrix(do.call(paste0, as.data.frame(m2))))
#  user  system elapsed 
#  7.42    0.00    7.43 
system.time({n = nrow(m2)
     do.call(paste0, lapply(seq_len(ncol(m2)),
       function(j) m2[seq(to=j*n, length.out=n)]))})
#  user  system elapsed 
#  6.19    0.00    6.20

the apply method is slightly faster, but I assume that there will be more rows than columns in the OP's dataset.

:) No, upvoted and then I'm like "Wow, can that really be? Lemme check". Still finding it kind of weird. It even does as well as `do.call(paste0, split(m2, col(m2)))` — Frank, Apr 21 '16 at 02:44
Could it be the 4 columns? Try with more cols to see the data frame overhead maybe — Pierre L, Apr 21 '16 at 02:46
I guess the minimum is around `system.time(paste0(m2[,1], m2[,2], m2[,3], m2[,4]))` and you're only 50% above that. It could probably be achieved with eval parse stuff (but I wouldn't do that). — Frank, Apr 21 '16 at 02:48
Thanks for the exhaustive answer. For what is worth, the fact that your solution was faster was quite counterintuitive for me. — Dambo, Apr 21 '16 at 02:55
On my system, this is a little faster: `system.time({n = nrow(m2); do.call(paste0, lapply(seq_len(ncol(m2)), function(j) m2[seq(to=j*n, length.out=n)]))})`. Certainly more convoluted, though. — Frank, Apr 21 '16 at 03:02

How can I collapse all the columns in a matrix to its first column?

2 Answers2

Benchmarks