I am working on an optimization problem, and to supply the analytic gradient to the routine, I need to compute the gradient of large 3D arrays with respect to parameters. The largest of these arrays s are of dimensions [L,N,J] where L,J ~ 2000, and N= 15. L and N stand for nodes over which the arrays are then aggregated up with some fixed weights w to vectors of length J. Computing the gradient naively generates a [L,N,J,J] arrays x whose elements are x(l,n,j,k) = -s(l,n,j)s(l,n,k) if j=/=k and x(l,n,j,j) = s(l,n,j)(1-s(l,n,j)).
Several functions in the procedure would use x as input, but as of right now I cannot keep x in memory due to its size. My approach so far has been to compute and directly aggregate up x over L and N to only ever store JxJ matrices, but the downside is that I cannot reuse x in other functions. This is what the following code does:
arma::mat agg_dsnode_ddelta_v3(arma::cube s_lnj,
arma::mat w_ln,
arma::vec w_l){
// Normal Matrix dimensions
unsigned int L = s_lnj.n_rows;
unsigned int N = s_lnj.n_cols;
unsigned int J = s_lnj.n_slices;
//resulting matrix
arma::mat ds_ddelta_jj = arma::mat(J,J, arma::fill::zeros);
for (unsigned int l = 0; l < L; l++) {
for (unsigned int n = 0; n < N; n++) {
arma::vec s_j = s_lnj.subcube(arma::span(l), arma::span(n), arma::span());
ds_ddelta_jj += - arma::kron(w_l(l) * w_ln(l,n) * s_j, s_j.as_row()) + arma::diagmat(w_l(l) * w_ln(l,n) * s_j);
}
}
return ds_ddelta_jj;
}
Alternatively, the 4-D array x could for instance be computed with sparseMatrix, but this approach does not scale up when the L and J increase
library(Matrix)
L = 2
N = 3
J = 4
s_lnj <- array(rnorm(L*N*J), dim=c(L,N,J))
## create spare Matrix with s(l,n,:) vertically on the diagonal
As_lnj = A = sparseMatrix(i=c(1:(L*N*J)),j=rep(1:(L*N), each=J),x= as.vector(aperm(s_lnj, c(3, 1, 2))))
## create spare Matrix with s(l,n,:) horizontally on the diagonal
Bs_lnj = sparseMatrix(i=rep(1:(L*N), each=J),j=c(1:(L*N*J)),x= as.vector(aperm(s_lnj, c(3, 1, 2))))
## create spare Matrix with s(l,n,:) diagonnally
Cs_lnj = sparseMatrix(i=c(1:(L*N*J)),j=c(1:(L*N*J)),x= as.vector(aperm(s_lnj, c(3, 1, 2))))
## compute 4-D array with sparseMatrix product
x = -(As_lnj %*% Bs_lnj) + Cs_lnj
I was wondering if you knew of faster way to implement the first code, or alternatively of an approach that would make the second one scalable.
Thank you in advance