2

I want to sort this matrix row-wise to descending order

 > set.seed(123); a <- matrix(rbinom(100,10,0.3),ncol=10)

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    2    6    5    6    1    1    4    4    2     1
 [2,]    4    3    4    5    3    3    1    3    4     4
 [3,]    3    4    3    4    3    4    3    4    3     2
 [4,]    5    3    7    4    2    1    2    0    4     4
 [5,]    5    1    4    0    2    3    4    3    1     2
 [6,]    1    5    4    3    1    2    3    2    3     2
 [7,]    3    2    3    4    2    1    4    2    6     4
 [8,]    5    1    3    2    3    4    4    3    5     1
 [9,]    3    2    2    2    2    5    4    2    5     3
[10,]    3    6    1    2    5    2    3    1    2     3

but

> do.call(order,as.list(a[1,],a[2,]))
[1] 1

How can you sort the matrix with the do.call and order?

Edit. Fixed above matrix to conform with the above code.

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Are you trying to sort each row independently or by multiple rows while preserving the matrix structure?? – Joseph Wood Jan 11 '17 at 15:08
  • 1
    Maybe [this helps](http://stackoverflow.com/questions/10508352/how-to-sort-a-matrix-in-r-row-wise) – Sotos Jan 11 '17 at 15:09
  • @JosephWood sort each row independently. I currently do it with `apply(1,function(x) order(x,decreasing=T))` but it is too slow. – Regan Alpha Jan 11 '17 at 15:12
  • @Sotos I read that but the sort is in some very odd order. – Regan Alpha Jan 11 '17 at 15:12
  • 1
    A simple `for` loop would probably do pretty well here. Something like `for(x in seq_len(nrow(a))) a[x,] <- order(a[x,],decreasing=T)`. – lmo Jan 11 '17 at 15:40
  • Since you are only ordering on one vector at a time, it may be even faster to use `sort.list` as it allows for the "radix" algorithm which is one of the elements in `data.table`'s secret sauce for super speed: `for(x in seq_len(nrow(a))) a[x,] <- sort.list(a[x,],decreasing=T, method="radix")`. – lmo Jan 11 '17 at 15:57

3 Answers3

4

Two alternatives:

# Jaap
do.call(rbind, lapply(split(a, row(a)), sort, decreasing = TRUE))

# adaption of lmo's solution in the comments
for(i in 1:nrow(a)) a[i,] <- a[i,][order(a[i,], decreasing = TRUE)]

gives:

   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
1     6    6    5    4    4    2    2    1    1     1
2     5    4    4    4    4    3    3    3    3     1
3     4    4    4    4    3    3    3    3    3     2
4     7    5    4    4    4    3    2    2    1     0
5     5    4    4    3    3    2    2    1    1     0
6     5    4    3    3    3    2    2    2    1     1
7     6    4    4    4    3    3    2    2    2     1
8     5    5    4    4    3    3    3    2    1     1
9     5    5    4    3    3    2    2    2    2     2
10    6    5    3    3    3    2    2    2    1     1

A benchmark with:

library(microbenchmark)
microbenchmark(dc.lapply.sort = do.call(rbind, lapply(split(a, row(a)), sort, decreasing = TRUE)),
               t.apply.sort = t(apply(a, 1, sort, decreasing = TRUE)),
               for.order = for(i in 1:nrow(a)) a[i,] <- a[i,][order(a[i,], decreasing = TRUE)],
               for.sort = for(i in 1:nrow(a)) a[i,] <- sort(a[i,], decreasing = TRUE),
               for.sort.list = for(x in seq_len(nrow(a))) a[x,] <- a[x,][sort.list(a[x,], decreasing = TRUE, method="radix")])

gives:

Unit: microseconds
           expr     min       lq      mean   median       uq      max neval cld
 dc.lapply.sort 189.811 206.5890 222.52223 217.8070 228.0905  332.034   100   c
   t.apply.sort 185.474 200.4515 212.59608 210.4930 220.0025  286.288   100  bc
      for.order  82.631  91.1860  98.66552  97.8475 102.9680  176.666   100 a  
       for.sort 167.939 187.5025 192.90728 192.1195 198.8690  256.494   100  b 
  for.sort.list 187.617 206.4475 230.82960 215.7060 221.6115 1541.343   100   c

It should be noted however that benchmarks are only meaningful on larger datasets, so:

set.seed(123)
a <- matrix(rbinom(10e5, 10, 0.3), ncol = 10)

microbenchmark(dc.lapply.sort = do.call(rbind, lapply(split(a, row(a)), sort, decreasing = TRUE)),
               t.apply.sort = t(apply(a, 1, sort, decreasing = TRUE)),
               for.order = for(i in 1:nrow(a)) a[i,] <- a[i,][order(a[i,], decreasing = TRUE)],
               for.sort = for(i in 1:nrow(a)) a[i,] <- sort(a[i,], decreasing = TRUE),
               for.sort.list = for(x in seq_len(nrow(a))) a[x,] <- a[x,][sort.list(a[x,], decreasing = TRUE, method="radix")],
               times = 10)

gives:

Unit: seconds
           expr      min       lq     mean   median       uq      max neval  cld
 dc.lapply.sort 6.790179 6.924036 7.036330 7.013996 7.121343 7.351729    10    d
   t.apply.sort 5.032052 5.057022 5.151560 5.081459 5.177159 5.538416    10   c 
      for.order 1.368351 1.463285 1.514652 1.471467 1.583873 1.736544    10 a   
       for.sort 5.028314 5.102993 5.317597 5.154104 5.348614 6.123278    10   c 
  for.sort.list 2.417857 2.464817 2.573294 2.519408 2.726118 2.815964    10  b  

Conclusion: the for-loop in combination with order is still the fastest solution.


Using the order2 and sort2 functions of the grr-package can give a further improvement in speed. Comparing them with the fastest solution from above:

set.seed(123)
a <- matrix(rbinom(10e5, 10, 0.3), ncol = 10)

microbenchmark(for.order = for(i in 1:nrow(a)) a[i,] <- a[i,][order(a[i,], decreasing = TRUE)],
               for.order2 = for(i in 1:nrow(a)) a[i,] <- a[i,][rev(grr::order2(a[i,]))],
               for.sort2 = for(i in 1:nrow(a)) a[i,] <- rev(grr::sort2(a[i,])),
               times = 10)

giving:

Unit: milliseconds
       expr       min        lq      mean    median        uq      max neval cld
  for.order 1243.8140 1263.4423 1316.4662 1305.1823 1378.5836 1404.251    10   c
 for.order2  956.1536  962.8226 1110.1778 1090.9984 1233.4241 1368.416    10  b 
  for.sort2  830.1887  843.6765  920.5668  847.1601  972.8703 1144.135    10 a  
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Or `t(apply(a, 1, sort, decreasing = TRUE))` – Aurèle Jan 11 '17 at 15:26
  • I cannot understand the sort. The first row of `a` in descreasing order by `sort(c(3, 5, 3, 3, 2, 4, 3, 3, 4, 2))` is `2 2 3 3 3 3 3 4 4 5` -- which is nothing like `1 1 1 2 2 4 4 5 6 6`, what is this doing here? – Regan Alpha Jan 11 '17 at 15:34
  • How do you make this into decreasing order? `do.call(rbind, lapply(split(a, row(a)), sort(decreasing=TRUE)))` fires an error that `x` is missing, this works in ascending order but not decending. – Regan Alpha Jan 11 '17 at 15:42
  • @ReganAlpha updated; also note that the code you gave to produce the matrix doesn't give the matrix you included in the question – Jaap Jan 11 '17 at 15:44
  • Also give my `sort.list` comment a try if you get a chance. – lmo Jan 11 '17 at 16:00
  • @lmo it isn't faster, see the update (btw: your solutions are returning the order not the ordered elements) – Jaap Jan 11 '17 at 16:07
  • Huh, that's kind of surprising that the radix sort option in `sort.list` is slower than vanilla `order` call. Maybe this is a scaling issue. Thanks for adding it. (I wasn't quite sure what user wanted from original post: order of items or items reordered.) – lmo Jan 11 '17 at 16:13
  • @lmo Ran it on a `10 x 1e6` matrix out of curiosity and the `for.sort.list` solution is still not a winner – Aurèle Jan 11 '17 at 16:23
  • Good to know. Does it get closer to the `for` loop with `order`, or still closer to the apply solutions? – lmo Jan 11 '17 at 16:28
  • @lmo uncluded an updated benchmark: `for`+`order` still faster; I've also run a benchmark between just those two on an even larger matrix: `sort.list` still about 2/3 slower – Jaap Jan 11 '17 at 20:01
  • Thanks for including that. It gets a lot closer to `for`+`order`, and now clearly surpasses `for`+`sort` but its fastest run is still slower than `order`'s slowest. – lmo Jan 11 '17 at 20:10
  • I implemented the methods on a large sparse matrix and the apply method needs a lot of RAM while the for-loop methods require a lot of CPU time, less RAM. Both methods are slow. Not sure yet which method is fastest when hitting the upper limits of RAM/CPU. What do you think? – hhh Jan 12 '17 at 07:55
  • 1
    @hhh I'm not sure about that. Maybe the `order2` and `sort2` functions from the `grr`-package can help. See the update at the end of the answer. – Jaap Jan 14 '17 at 21:15
-1

t(apply(a, 1, sort, decreasing = TRUE)) gives:

#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#  [1,]    6    6    5    4    4    2    2    1    1     1
#  [2,]    5    4    4    4    4    3    3    3    3     1
#  [3,]    4    4    4    4    3    3    3    3    3     2
#  [4,]    7    5    4    4    4    3    2    2    1     0
#  [5,]    5    4    4    3    3    2    2    1    1     0
#  [6,]    5    4    3    3    3    2    2    2    1     1
#  [7,]    6    4    4    4    3    3    2    2    2     1
#  [8,]    5    5    4    4    3    3    3    2    1     1
#  [9,]    5    5    4    3    3    2    2    2    2     2
# [10,]    6    5    3    3    3    2    2    2    1     1
Aurèle
  • 12,545
  • 1
  • 31
  • 49
  • Which one do you think is faster? I tried with this small example `> system.time(t(apply(a, 1, sort, decreasing = TRUE))) user system elapsed 0 0 0 > system.time(t(apply(a, 1, function(x) order(x,decreasing=T)))) user system elapsed 0 0 0 ` but no answer yet. Is Jaap's solution faster? :/ – Regan Alpha Jan 11 '17 at 15:53
  • You can `microbenchmark` them, they're the same: `microbenchmark::microbenchmark( t(apply(a, 1, sort, decreasing = TRUE)), do.call(rbind, lapply(split(a, row(a)), sort, decreasing = TRUE)), times = 1000L )`. Mean time: 330 microseconds, Median time: 300 microseconds for both, on my machine – Aurèle Jan 11 '17 at 15:57
-1

I did also microbenchmarking and it seems that the order solutions win :)

>     microbenchmark(jaap1 = do.call(rbind, lapply(split(a, row(a)), sort, decreasing = TRUE)),
+                    apom = t(apply(a, 1, sort, decreasing = TRUE)),
+                    jaap2 = for(i in 1:nrow(a)) a[i,] <- a[i,][order(a[i,], decreasing = TRUE)],
+                    jaap3 = for(i in 1:nrow(a)) a[i,] <- sort(a[i,], decreasing = TRUE), 
+                    alpha = t(apply(a, 1, function(x) order(x, decreasing = T))),
+                    times = 1000L)
Unit: microseconds
  expr     min       lq     mean   median       uq      max neval
 jaap1 318.193 364.6125 404.3224 389.5845 417.6405 1422.087  1000
  apom 276.764 340.2740 389.1302 364.9650 398.3680 2854.710  1000
 jaap2 121.332 158.4845 189.5616 182.2070 202.2390 1170.602  1000
 jaap3 247.387 309.2445 351.6959 332.2710 365.3640 1361.720  1000
 alpha 139.244 178.7460 209.6122 202.8580 226.7585 1092.301  1000
  • To emphasize: not just `order` solution, but a `for` loop. – lmo Jan 11 '17 at 16:14
  • @Imo: I cannot understand how to use the for-loop with undefined variables like [here](http://pastie.org/private/c2njroqyayydf4wcgoicva), I use dummy variables to store intermediate results but I cannot refer to them like mmm[1,] when they are not yet initiated so with for-loop I need to initiate variables first somehow? – Regan Alpha Jan 11 '17 at 16:30
  • I don't quite follow your question. You do need to initiate variables before referring to them. If you know the size and shape of an object beforehand, you should create it. for example to initialize a above, you could use `a <- matrix(0, 10, 10)` before running a loop and then fill it in. If you are asking something different and do not find a solution after doing a bit of searching on SO, it may be worth posting as a new question. – lmo Jan 11 '17 at 17:02