Vectorization of double for loop including sine of two variables

Question

I need to numerically evaluate some integrals which are all of the form shown in this image:

These integrals are the matrix elements of a N x N matrix, so I need to evaluate them for all possible combinations of n and m in the range of 1 to N. The integrals are symmetric in n and m which I have implemented in my current nested for loop approach:

function [V] = coulomb3(N, l, R, R0, c, x)
r1 = 0.01:x:R;
r2 = R:x:R0;
r = [r1 r2];
rl1 = r1.^(2*l);
rl2 = r2.^(2*l);
sines = zeros(N, length(r));
V = zeros(N, N);
for i = 1:N;
    sines(i, :) = sin(i*pi*r/R0);
end

x1 = length(r1);
x2 = length(r);
for nn = 1:N
    for mm = 1:nn
        f1 = (1/6)*rl1.*r1.^2.*sines(nn, 1:x1).*sines(mm, 1:x1);
        f2 = ((R^2/2)*rl2 - (R^3/3)*rl2.*r2.^(-1)).*sines(nn, x1+1:x2).*sines(mm, x1+1:x2);
        value = 4*pi*c*x*trapz([f1 f2]);
        V(nn, mm) = value;
        V(mm, nn) = value;
    end
end

I figured that calling sin(x) in the loop was a bad idea, so I calculate all the needed values and store them. To evaluate the integrals I used trapz, but as the first and the second/third integrals have different ranges the function values need to be calculated separately and then combined.

I've tried a couple different ways of vectorization but the only one that gives the correct results takes much longer than the above loop (used gmultiply but the arrays created are enourmous). I've also made an analytical solution (which is possible assuming m and n are integers and R0 > R > 0) but these solutions involve a cosine integral (cosint in MATLAB) function which is extremely slow for large N.

I'm not sure the entire thing can be vectorized without creating very large arrays, but the inner loop at least should be possible. Any ideas would be be greatly appreciated!

The inputs I use currently are:

R0 = 1000;
R = 8.4691;
c = 0.393*10^(-2);
x = 0.01;
l = 0 # Can reasonably be 0-6;
N = 20; # Increasing the value will give the same results, 
# but I would like to be able to do at least N = 600;

Using these values

V(1, 1:3) = 873,379900963549    -5,80688363271849   -3,38139152472590

Although the diagonal values never converge with increasing R0 so they are less interesting.

Does the code you posted give the correct result for the computation? — Dev-iL, Feb 27 '16 at 11:23
With a spacing of 0.01 the values are very close a couple samples I've calculated analytically in Maple, so I do believe it gives the correct results. — M-P, Feb 27 '16 at 11:45
Ok. Could you please give an example of some reasonable inputs? — Dev-iL, Feb 27 '16 at 11:48
What about `l` ? Make sure you post a [MCVE], with an emphasis on **runnable**. — Dev-iL, Feb 27 '16 at 12:00
I've added `l` values, given an example of the out values and made sure the code is runnable. — M-P, Feb 27 '16 at 12:12

score 2 · Answer 1 · edited May 23 '17 at 11:53

2

You will lose the gain from the symmetricity of the problem with my approach, but this means a factor of 2 loss. Odds are that you'll still benefit in the end.

The idea is to use multidimensional arrays, making use of trapz supporting these inputs. I'll demonstrate the first term in your figure, as the two others should be done similarly, and the point is the technique:

r1 = 0.01:x:R;
r2 = R:x:R0;
r = [r1 r2].';
rl1 = r1.'.^(2*l);
rl2 = r2.'.^(2*l);
sines = zeros(length(r),N);       %// CHANGED!!
%// V = zeros(N, N);  not needed now, see later
%// you can define sines in a vectorized way as well:
sines = sin(r*(1:N)*pi/R0);     %//' now size [Nr, N] !

%// note that implicitly r is of size [Nr, 1, 1]
%// and sines is of size [Nr, N, 1]
sines2mat = permute(sines,[1, 3, 2]);  %// size [Nr, 1, N]

%// the first term in V: perform integral along first dimension
%//V1 = 1/6*squeeze(trapz(bsxfun(@times,bsxfun(@times,r.^(2*l+2),sines),sines2mat),1))*x;  %// 4*pi*c prefactor might be physics, not math
V1 = 1/6*permute(trapz(bsxfun(@times,bsxfun(@times,r.^(2*l+2),sines),sines2mat),1),[2,3,1])*x;  %// 4*pi*c prefactor might be physics, not math

The key point is that bsxfun(@times,r.^(2*l+2),sines) is a matrix of size [Nr,N,1], which is again multiplied by sines2mat using bsxfun, the result is of size [Nr,N,N] and an element (k1,k2,k3) corresponds to an integrand at radial point k1, n=k2 and m=k3. Using trapz() with explicitly the first dimension (which would be default) reduces this to an array of size [1,N,N], which is just what you need after a good squeeze(). Update: as per @Dev-iL's comment you should use permute instead of squeeze to get rid of the leading singleton dimension, as that might be more efficent.

The two other terms can be handled the same way, and of course it might still help if you restructure the integrals based on overlapping and non-overlapping parts.

edited May 23 '17 at 11:53

Community

1
1

answered Feb 27 '16 at 12:14

Andras Deak -- Слава Україні

33,737
11
83
111

1

I'd suggest `permute` instead of `squeeze` as in my experience it performs much better. It's a bit like the difference between `for` and `while` (if you know which dimension is the singleton one, just get rid of *it* specifically)... – Dev-iL Feb 27 '16 at 12:17
I get can error (number of elements must not change) in the reshape command unless I change it to `reshape(sines, [length(r), 1, N])` but then I get an error from bsxfun (non-singleton dimensions must match) – M-P Feb 27 '16 at 12:29
@MadsPeter sorry, that was a brain fart, should be `permute`, no `reshape`s in there. Please see my updated version:) – Andras Deak -- Слава Україні Feb 27 '16 at 12:40
@MadsPeter there was another issue, leading to the error in `bsxfun`. I've finally had the time to fire up matlab and test it: now it's fixed. The problem was that I thought that `r` is a column vector, while it was a row one. I changed a few definitions at the start, and vectorized the computation of `sines`. Please let me know if there are further issues. – Andras Deak -- Слава Україні Feb 27 '16 at 12:55
@AndrasDeak Thanks, all is working now! I will follow this technique for the other intergrals and make an update when I know I get the correct results. – M-P Feb 27 '16 at 12:55
@MadsPeter if this answer has solved your question please consider [accepting it](http://meta.stackexchange.com/q/5234/179419) by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this. – Dev-iL Feb 27 '16 at 13:50
@Dev-iL I've run some tests, and while this method does produce the correct results it is very memory intensive (for N = 400 it's not runnable as I run out of memory). – M-P Feb 27 '16 at 13:59
@MadsPeter did you try increasing [Java Heap Memory](http://blogs.mathworks.com/community/2010/04/26/controlling-the-java-heap-size/)? Another thing you might want to try is what I call "blockwise vectorization" (which is something in between loops and complete vectorization, for memory-intensive computations). You can see a demonstration of this idea [here](http://stackoverflow.com/questions/28913155/is-it-possible-to-speed-up-this-matlab-script/28913274#28913274). Additionally, if you don't need the precision of `double` you might want to work in `single` precision to save memory. – Dev-iL Feb 27 '16 at 14:09
@Dev-iL Increasing Java Heap Memory seems to let the code run for longer before giving an error, but it doesn't finish in a reasonable time table. I think a hybrid approach is the way to go and I will try implement one. I'll see if I can use `single` instead. – M-P Feb 27 '16 at 14:28
@MadsPeter this is how vectorization works: you gain computational time at the cost of memory. I suspect that you'll need double precision for your Coulomb integrals though, but this also depends on your application (and on `l`). You could chip off a bit on memory need by linearizing the two `n,m` indices: constructing an array which contains the lower triangular `(n,m)` pairs, working with arrays of size `[Nr, N(N+1)/2]` instead of `[Nr, N, N]`, and assigning the result to upper triangular form in the end, symmetrizing the final result. – Andras Deak -- Слава Україні Feb 27 '16 at 14:39
@AndrasDeak Yes I'm aware, I was just hoping it would be less memory intense than it is I guess. I will keep your comments in mind while trying to implement a hybrid version. – M-P Feb 27 '16 at 15:05

Vectorization of double for loop including sine of two variables

1 Answers1