5

I am trying to efficiently compute rowMaxs in Rcpp. A very simple implementation is

arma::mat RcppRowmaxs(arma::mat x){  

  int N = x.n_rows;
  arma::mat rm(N,1);

  for(int nn = 0; nn < N; nn++){
      rm(nn) = max(x.row(nn));
  }

  return(rm);
}

which works perfectly fine. However, comparing this function to other packages, it turned out that other implementations are by far more efficient. Specifically, Rfast::rowMaxs is more than 6 times faster than the simple Rcpp implementation!

Naturally, I tried to mimic the behavior of Rfast. However, as a beginner in Rcpp, I only tried to load Rfast::rowMaxs directly in Rcpp as described e.g. here. Unfortunately, using an Rcpp script to load an R function that again calls an Rcpp script seems pretty slow following my benchmark (see row "RfastinRcpp"):

m = matrix(rnorm(1000*1000),1000,1000)

microbenchmark::microbenchmark(

  matrixStats    = matrixStats::rowMaxs(m),
  Rfast          = Rfast::rowMaxs(m,value=T),
  Rcpp           = RcppRowmaxs(m),
  RfastinRcpp    = RfastRcpp(m),
  apply          = apply(m,1,max)

)

Unit: microseconds
        expr       min         lq       mean     median        uq        max neval cld
 matrixStats  1929.570  2042.8975  2232.1980  2086.5180  2175.470   4025.923   100 a  
       Rfast   666.711   727.2245   842.5578   795.2215   891.443   1477.969   100 a  
        Rcpp  5552.216  5825.4855  6186.9850  5997.8295  6373.737   8568.878   100  b 
 RfastinRcpp  7495.042  7931.2480  9471.8453  8382.6350 10659.672  19968.817   100  b 
       apply 12281.758 15145.7495 22015.2798 17202.9730 20310.939 136844.591   100   c

Any tips on how to improve performance in the function I've provided above? I've looked at the source code from Rfast and believe that this is the correct file. However, so far I did not manage to locate the important parts of the code.

Edit: Changed the post to focus on Rfast now, following the answer of Michail.

yrx1702
  • 1,619
  • 15
  • 27
  • I guess `x.row()` is accessing a copy of the data? – F. Privé Apr 13 '20 at 07:52
  • 3
    You can do better with just returning the result of `arma::max(x)`, though it still isn't quite as fast as `matrixStats::RowMaxs()`; with 1000 x 1000 matrix on my machine, median execution time for your code is 4 milliseconds, for `arma::max(x)` is 2.76 milliseconds, and for `matrixStats::RowMaxs()` is 1.94 milliseconds. So, since I haven't studied the `matrixStats` vs. `arma` source closely, I can't say for sure what makes `matrixStats` faster, but I don't think it could (just) be `x.row()` causing copies. Interesting – duckmayr Apr 13 '20 at 11:31
  • 1
    You could use parallel to reduce your for loop time. – Manos Papadakis Apr 15 '20 at 14:32

2 Answers2

2

I just did some experiments on my laptop. I have an 5 year old HP with 2 intel i5 cores at 2.3 GHz. Attached is an picture with my results. Rfast's implementation is way faster than matrixStats' implementation, always and as the matrix gets bigger the time difference increases.

enter image description here

Michail
  • 177
  • 1
  • 6
  • Could you post a reproducible example instead of a picture? For example, where does `matrnorm` come from (I know it comes from `Rfast`, but the general reader has no idea). It might also help to explain what `value = T` does. – Joseph Wood Apr 14 '20 at 23:26
  • 2
    @ManosPapadakis, I understand that. But if I want to confirm the results, it makes it very difficult to reproduce. I would have to hand type everything in the picture as opposed to simply copying and pasting a snippet of code. Check out the **Complete** section here: https://stackoverflow.com/help/minimal-reproducible-example. It states: **"DO NOT use images of code. Copy the actual text from your code editor, paste it into the question, then format it as code. This helps others more easily read and test your code."** – Joseph Wood Apr 15 '20 at 11:13
  • 1
    Whoops, that's my bad. `value=T` tells Rfast::rowMaxs to return the actual value of the maximum of each row. `value=F` returns the index of the maximum (which is way slower and, unfortunately, the default value). – yrx1702 Apr 15 '20 at 11:41
  • 2
    Changed the posting now to focus on the Rfast implementation, which is even more impressive in terms of speed! – yrx1702 Apr 15 '20 at 11:51
  • 1
    I could add ```rowMax``` as a linking function. Then you could use it using Rcpp's depend mechanism. I will think about for the next version. – Manos Papadakis Apr 15 '20 at 14:28
  • Joseph I added the picture because I did not know how to add the code. I found out today. It would be nice if there was an example of how to do it. Secondly, this forum gives th option to upload a picture. If they do not allow it, why do they give the option? – Michail Apr 15 '20 at 14:47
  • And third, you see the code in the picture. You can simply type it in yourself. Why is it difficult to reproduce it? – Michail Apr 15 '20 at 14:48
1
library(Rfast)
library(microbenchmark)
library(matrixStats)

x <- matrnorm(100,100)
microbenchmark(Rfast::rowMaxs(x,value=TRUE), matrixStats::rowMaxs(x),times=10)
Unit: microseconds
                          expr  min   lq   mean median   uq    max neval
Rfast::rowMaxs(x, value = TRUE) 20.5 20.9 242.64  21.50 23.2 2223.8  10
        matrixStats::rowMaxs(x) 43.7 44.7 327.43  46.95 88.2 2776.8  10

x <- matrnorm(1000,1000)
microbenchmark(Rfast::rowMaxs(x,value=TRUE), matrixStats::rowMaxs(x),times=10)
Unit: microseconds
                            expr    min     lq    mean median     uq    max neval
 Rfast::rowMaxs(x, value = TRUE)  799.5  844.0  875.08  858.5  900.3  960.3    10
         matrixStats::rowMaxs(x) 2229.8 2260.8 2303.04 2269.4 2293.3 2607.8    10

x <- matrnorm(10000,10000)
microbenchmark(Rfast::rowMaxs(x,value=TRUE), matrixStats::rowMaxs(x),times=10)
Unit: milliseconds
                            expr      min       lq      mean    median       uq      max neval
 Rfast::rowMaxs(x, value = TRUE)  82.1157  83.4288  85.81769  84.57885  86.2742  93.0031    10
         matrixStats::rowMaxs(x) 216.0003 218.5324 222.46670 221.00330 225.3302 237.7666    10
Michail
  • 177
  • 1
  • 6