1

I am working on an optimization problem, and to supply the analytic gradient to the routine, I need to compute the gradient of large 3D arrays with respect to parameters. The largest of these arrays s are of dimensions [L,N,J] where L,J ~ 2000, and N= 15. L and N stand for nodes over which the arrays are then aggregated up with some fixed weights w to vectors of length J. Computing the gradient naively generates a [L,N,J,J] arrays x whose elements are x(l,n,j,k) = -s(l,n,j)s(l,n,k) if j=/=k and x(l,n,j,j) = s(l,n,j)(1-s(l,n,j)).

Several functions in the procedure would use x as input, but as of right now I cannot keep x in memory due to its size. My approach so far has been to compute and directly aggregate up x over L and N to only ever store JxJ matrices, but the downside is that I cannot reuse x in other functions. This is what the following code does:

arma::mat agg_dsnode_ddelta_v3(arma::cube s_lnj,
                               arma::mat w_ln,
                               arma::vec w_l){
  // Normal Matrix dimensions
  unsigned int L = s_lnj.n_rows;
  unsigned int N = s_lnj.n_cols;
  unsigned int J = s_lnj.n_slices;
  
  //resulting matrix
  arma::mat ds_ddelta_jj = arma::mat(J,J, arma::fill::zeros);
  
  for (unsigned int l = 0; l < L; l++) {
    
    for (unsigned int n = 0; n < N; n++) {
      
      arma::vec s_j = s_lnj.subcube(arma::span(l), arma::span(n), arma::span());
      ds_ddelta_jj += - arma::kron(w_l(l) * w_ln(l,n) * s_j, s_j.as_row())  + arma::diagmat(w_l(l) * w_ln(l,n) * s_j);
    }
  }
  return ds_ddelta_jj;
}

Alternatively, the 4-D array x could for instance be computed with sparseMatrix, but this approach does not scale up when the L and J increase

library(Matrix)
L = 2
N = 3
J = 4
s_lnj <- array(rnorm(L*N*J), dim=c(L,N,J))

## create spare Matrix with s(l,n,:) vertically on the diagonal 
As_lnj = A = sparseMatrix(i=c(1:(L*N*J)),j=rep(1:(L*N), each=J),x= as.vector(aperm(s_lnj, c(3, 1, 2))))

## create spare Matrix with s(l,n,:) horizontally on the diagonal 
Bs_lnj = sparseMatrix(i=rep(1:(L*N), each=J),j=c(1:(L*N*J)),x= as.vector(aperm(s_lnj, c(3, 1, 2))))

## create spare Matrix with s(l,n,:) diagonnally
Cs_lnj = sparseMatrix(i=c(1:(L*N*J)),j=c(1:(L*N*J)),x= as.vector(aperm(s_lnj, c(3, 1, 2))))

## compute 4-D array with sparseMatrix product
x = -(As_lnj %*% Bs_lnj) + Cs_lnj

I was wondering if you knew of faster way to implement the first code, or alternatively of an approach that would make the second one scalable.

Thank you in advance

pbstx412
  • 11
  • 3
  • I think you want to stay with dense case, and look into using one of the optimized BLAS/LAPACK libraries such as Atlas (tuned), OpenBLAS (multithreaded), MKL (same), ... These numerical analysis routines have been optimized for decased, are best of breed and R and (Rcpp)Armadillo automatically dispatch to them. – Dirk Eddelbuettel Nov 22 '21 at 19:11
  • Thank you for your reply! The first batch of code in my question uses RcppArmadillo, I just did not include the header of the .cpp file. So it should in principle automatically dispatch to one of the routines you mentioned, right? – pbstx412 Nov 23 '21 at 15:44
  • Yes. And the BLAS/LAPACK libraries are 'plug-in': you can replace whichever ones you have now with different one. Running `sessionInfo()` in R will give you hint about you currently have, what replacements are available and how you install them depends a little on your operating system. – Dirk Eddelbuettel Nov 23 '21 at 15:51

0 Answers0