1

so I'm really confused about the recommended way to pass sparse Matrices from R to c++. I was under the impression that sp_mat was the correct argument type to do it as in the code below

testCode = '                                                                                                                                                                                                                                                           
#include <RcppArmadillo.h>                                                                                                                                                                                                                                             
// [[Rcpp::depends(RcppArmadillo)]]                                                                                                                                                                                                                                    

// [[Rcpp::export]]                                                                                                                                                                                                                                                    
void testFun(arma::sp_mat F){                                                                                                                                                                                                                                          

  Rcpp::Rcout << "F has " << F.n_rows << " rows" << std::endl;                                                                                                                                                                                                         

}'

Rcpp::sourceCpp(code = testCode)

n = 70000
M = Matrix::sparseMatrix(i=c(n), j=c(n), x=c(1))

testFun(M)

However, running this code generates the following error:

error: SpMat::init(): requested size is too large
Error in testFun(M) : SpMat::init(): requested size is too large
Calls: testFun -> .Call
Execution halted

I saw the example in https://gallery.rcpp.org/articles/armadillo-sparse-matrix/ but I'm not sure if it is saying that every time we pass a sparse Matrix to c++ we should use the function provided there? Thanks for a clarification!

marcin_j
  • 414
  • 4
  • 18
  • You have a run-time error. N=70000 does not pass. Try a smaller N. Look at the existing example for how to use this -- note eg that the very link you gave has _index vectors_ as arguments for `i` and `j`. – Dirk Eddelbuettel Mar 24 '20 at 20:49
  • I mean the whole point is that I **need** my matrix to have N this big (or even bigger actually). Let me change the example using vectors, but this is really not the problem. The matrix I actually need to pass is an instance of 'dgCMatrix' created automatically. – marcin_j Mar 24 '20 at 20:52
  • A _dense_ matrix will have rows * cols * bytes. A _sparse_ matrix, essentially, has 3 * N * 8 as it keeps index vectors along with cell values. So you can have a _nominal_ size of 70k x 70k and still fit it _if you actually provide the cell indices_ as a sparse format requires. – Dirk Eddelbuettel Mar 24 '20 at 21:03
  • yes, exactly! This is precisely why I'm using a sparse matrix here! The matrix M in this example will have dimensions 70k by 70k but only a single nonzero value - which is why I'm surprised by the error. Looks like I'm not understanding something because it seems to me that my example (now that I changed indices to vectors - didn't help) is the same as the one posted in Rcpp gallery, except for the _nominal_ dimensions of the matrix. – marcin_j Mar 24 '20 at 21:09
  • (I am by no means all that familiar with sparse matrices and all their details but I think it is user error. See answer I will just post...) – Dirk Eddelbuettel Mar 24 '20 at 21:51
  • 1
    ok, I think I found the answer looking at the armadillo source code. It turns out it is also described here: https://stackoverflow.com/questions/40592054/large-matrices-in-rcpparmadillo-via-the-arma-64bit-word-define – marcin_j Mar 24 '20 at 21:55

2 Answers2

2

Ok, so I think I foud the answer. Basically armadillo throws this error if the total number of elements is larger than the size of the variable that stores the number of elements as seen here: https://gitlab.com/conradsnicta/armadillo-code/-/blob/9.900.x/include/armadillo_bits/SpMat_meat.hpp

Someone realized this before and provided a solution here: Large Matrices in RcppArmadillo via the ARMA_64BIT_WORD define

marcin_j
  • 414
  • 4
  • 18
1

If you revisit the basic example in the Rcpp Gallery and set up one or two sparse Matrix objects, it becomes apparent that a high value for j leads to a full expansion in the p slot (examine the object returned from sparseMatrix).

So here is a simpler example with a (fairly high still) i value but a low j. I think you should be able to take it from here:

Code

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp ;

// [[Rcpp::export]]
void convertSparse(S4 mat) {

    // obtain dim, i, p. x from S4 object
    IntegerVector dims = mat.slot("Dim");
    arma::urowvec i = Rcpp::as<arma::urowvec>(mat.slot("i"));
    arma::urowvec p = Rcpp::as<arma::urowvec>(mat.slot("p"));
    arma::vec x     = Rcpp::as<arma::vec>(mat.slot("x"));

    int nrow = dims[0], ncol = dims[1];

    // use Armadillo sparse matrix constructor
    arma::sp_mat res(i, p, x, nrow, ncol);
    Rcout << "SpMat res:\n" << res << std::endl;
}

// [[Rcpp::export]]
void convertSparse2(S4 mat) {         // slight improvement with two non-nested loops

    IntegerVector dims = mat.slot("Dim");
    arma::urowvec i = Rcpp::as<arma::urowvec>(mat.slot("i"));
    arma::urowvec p = Rcpp::as<arma::urowvec>(mat.slot("p"));
    arma::vec x     = Rcpp::as<arma::vec>(mat.slot("x"));

    int nrow = dims[0], ncol = dims[1];
    arma::sp_mat res(nrow, ncol);

    // create space for values, and copy
    arma::access::rw(res.values) = arma::memory::acquire_chunked<double>(x.size() + 1);
    arma::arrayops::copy(arma::access::rwp(res.values), x.begin(), x.size() + 1);

    // create space for row_indices, and copy
    arma::access::rw(res.row_indices) = arma::memory::acquire_chunked<arma::uword>(i.size() + 1);
    arma::arrayops::copy(arma::access::rwp(res.row_indices), i.begin(), i.size() + 1);

    // create space for col_ptrs, and copy
    arma::access::rw(res.col_ptrs) = arma::memory::acquire<arma::uword>(p.size() + 2);
    arma::arrayops::copy(arma::access::rwp(res.col_ptrs), p.begin(), p.size() + 1);

    // important: set the sentinel as well
    arma::access::rwp(res.col_ptrs)[p.size()+1] = std::numeric_limits<arma::uword>::max();

    // set the number of non-zero elements
    arma::access::rw(res.n_nonzero) = x.size();

    Rcout << "SpMat res:\n" << res << std::endl;
}


/*** R
suppressMessages({
  library(methods)
  library(Matrix)
})
i <- c(1,3:6)
j <- c(2,9,6:8)
x <- 5 * (1:5)
A <- sparseMatrix(i, j, x = x)
print(A)
convertSparse(A)

i <- 56789
j <- 87
x <- 42
B <- sparseMatrix(i, j, x=x)
#print(B)
convertSparse(B)
convertSparse2(B)
*/

Output

R> Rcpp::sourceCpp("~/git/stackoverflow/60838958/answer.cpp")

R> suppressMessages({library(methods); library(Matrix)})

R> i <- c(1,3:6)

R> j <- c(2,9,6:8)

R> x <- 5 * (1:5)

R> A <- sparseMatrix(i, j, x = x)

R> print(A)
6 x 9 sparse Matrix of class "dgCMatrix"

[1,] . 5 . . .  .  .  .  .
[2,] . . . . .  .  .  .  .
[3,] . . . . .  .  .  . 10
[4,] . . . . . 15  .  .  .
[5,] . . . . .  . 20  .  .
[6,] . . . . .  .  . 25  .

R> convertSparse(A)
SpMat res:
[matrix size: 6x9; n_nonzero: 5; density: 9.26%]

     (0, 1)          5.0000
     (3, 5)         15.0000
     (4, 6)         20.0000
     (5, 7)         25.0000
     (2, 8)         10.0000



R> i <- 56789

R> j <- 87

R> x <- 42

R> B <- sparseMatrix(i, j, x=x)

R> #print(B)
R> convertSparse(B)
SpMat res:
[matrix size: 56789x87; n_nonzero: 1; density: 2.02e-05%]

 (56788, 86)        42.0000



R> convertSparse2(B)
SpMat res:
[matrix size: 56789x87; n_nonzero: 1; density: 2.02e-05%]

 (56788, 86)        42.0000


R> 

Edit Indeed, good reminder. If we add

#define ARMA_64BIT_WORD 1

before includeing the RcppArmadillo.h header, then with both i and j large everything works out fine. Tail of the output below.

Updated output (just the tail end)

R> i <- 56789

R> j <- 87654

R> x <- 42

R> B <- sparseMatrix(i, j, x=x)

R> #print(B)
R> convertSparse(B)
SpMat res:
[matrix size: 56789x87654; n_nonzero: 1; density: 2.01e-08%]

 (56788, 87653)     42.0000



R> convertSparse2(B)
SpMat res:
[matrix size: 56789x87654; n_nonzero: 1; density: 2.01e-08%]

 (56788, 87653)     42.0000


R> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725