2

I am looking for the fastest way to solve the following problem:

Given some volume of lattice points in a 3D grid, some points b_i (the boundary) satisfy f(b_i)=0, while another point a_0 satisfies f(a_0)= 1.

All other points (non-boundary) are some linear combination of the surrounding 26 points. For example, I could want

f(i, j, k)= .05*f(i+1, j, k)+.1*f(i, j+1, k)+.07*f(i, j, k+1)+...

The sum of the coefficients .05+.1+.07+... will add up to 1. My objective is to solve for f(x_i) for all x_i in the volume.

Currently, I am using the successive over-relaxation (SOR) method, which basically initializes the boundary of the volume, assigns to each point the weighted average of the 26 surrounding points, and repeats. The SOR method just takes a combination of f(x_i) after the most recent iteration and f(x_i) after an iteration before.

I was wondering if anyone knows of any faster ways to solve this problem for a 3D grid around the size 102x102x48. SOR currently takes about 500-1000 iterations to converge to the level I want (varying depending on the coefficients used). I am most willing to use matlab, idl, and c++. Does anyone have any idea of how fast SOR is compared to converting the problem into a linear system and using matrix methods (like BCGSTAB)? Also, which method would be most effectively (and easily) parallelized? I have access to a 250 core cluster, and have been trying to make the SOR method parallel using mpi and c++, but haven't seen as much speed increase as I would like (ideal would be on the order of 100-fold). I would greatly appreciate any ideas on speeding up the solution to this problem. Thanks.

Marvin
  • 51
  • 7
  • I've got a problem with your problem (though a simple one) : do you have access to the coefficients of the linear combination or are they variables too? – B. Decoster Jul 07 '11 at 14:20
  • Are the linear combination coefficients the same for all points? Do the boundary points form a complete hull? – Svante Jul 07 '11 at 14:42
  • @Fezvez: yes, I do have access to the coefficients. – Marvin Jul 07 '11 at 19:12
  • @Svante: the linear combination coefficients will not be the same for all points. I don't exactly know what you mean by complete hull, but the boundary points will probably form a convex hull. The volume is not necessarily simply connected though. – Marvin Jul 07 '11 at 19:15

2 Answers2

3

If you're comfortable with multithreading, using a red-black scheme for SOR can give a decent speedup. For a 2D problem, imagine a checkerboard - the red squares only depend on the black squares (and possibly themselves), so you can update all the red squares in parallel, and then repeat for all the black ones. Note that this does converge more slowly than the simple ordering, but it lets you spread the work over multiple threads.

Conjugate gradient methods will generally converge faster than SOR (if I remember correctly, by about an order of magnitude). I've never used BCGSTAB, but I remember GMRES working well on non-symmetric problems, and they can probably both benefit from preconditioning.

As for opportunities for parallelization, most CG-type methods only need you to compute the matrix-vector product A*x, so you never need to form the full matrix. That's probably the biggest cost of each iteration, and so it's where you should look at multithreading.

Two good references on this are Golub and Van Loan, and Trefethen and Bau. The latter is much more readable IMHO.

Hope that helps...

celion
  • 3,864
  • 25
  • 19
2

I've got (I think) an exact solution, however, it might be long. The main problem would be computation on a very very big (and hopefully sparse) matrix.

Let's say you have N points in your volume. Let's call p1...pN these points. What you are looking for is f(p1)...f(pN)

Let's take a point pi. It has got 26 neighbours. Let's call them pi-1...pi-26. I suppose you have access for each point pi to the coefficients of the linear combination. Let's call them ci-1...ci-j...ci-26 (for "coefficient of the linear combination for point pi to its j-th neighbour)

Let's do even better, you can consider that pi is a linear combination of all the points in your space, except that most of them (except 26) are equals to 0. You have your coefficients ci-1...ci-N.

You can now build a big matrix N*N of these ci-j coefficients :

+---------------------    -------+   
|  0  | c1-2 | c1-3 | ... | c1-N |   |f(p1)|    |f(p1)|
+---------------------    -------+   |     |    |     |
| c2_1|   0  | c2-3 | ... | c1-N |   |f(p2)|    |f(p2)|
+---------------------    -------+ * .     . =  .     .
|                                |   .     .    .     .
.                                .   .     .    .     .
+---------------------    -------+   .     .    .     .
| cN-1| cN-2 | cN-3 | ... |   0  |   |f(pN)|    |f(pN)|
+---------------------    -------+

Amazing! The solution you're looking for is one of the eigenvectors corresponding to the eigenvalue 1!

Use an optimised matrix library that computes eigenvectors efficiently (for sparse matrix) and hope it is already parallelised!

Edit : Amusing, I just reread your post, seems I just gave you your BCGSTAB solution... Sorry! ^^

Re-edit : In fact, I'm not sure, are you talking about "Biconjugate Gradient Stabilized Method"? Because I don't see the linear method you're talking about if you're doing gradient descent...

Re-re-edit : I love math =) I can even prove that given the condition sum of the ci = 1 that 1 is in fact an eigenvalue. I do not know the dimension of the corresponding space, but it is at least 1!

B. Decoster
  • 7,723
  • 1
  • 32
  • 52
  • Thanks for the response. I never saw that the solution corresponds to an eigenvector, thanks for the insight! I'll try using an eigenvector routine in Matlab and IDL and see how fast it is. And I was talking about the Biconjugate Gradient Stabilized Method, but I was looking at solving the linear system directly rather than finding an eigenvector. – Marvin Jul 07 '11 at 19:29
  • Hmmm, so I ran into a problem. So recall the boundary condition that `f(a_0)=1` for some `a_0`. Since we don't include `f(a_0)` in the vector of variables that we are solving for, this condition effectively adds a constant vector to the left side of the matrix equation you wrote (nice typesetting btw), or equivalently it subtracts the constant vector from the right side. Then the solution is no longer an eigenvector. – Marvin Jul 07 '11 at 20:39
  • Erm...not really. Just find an eigenvector V. Pray that f(a_0) != 0. If it's the case, simply normalize the vector V' = V/(f(a_0)). You'll end up with an eigenvector V' which verifies f(a_0)=1. And, thanks for trying the method! – B. Decoster Jul 08 '11 at 08:05
  • I mean, there is a certain i in `[1..N]` that verifies `a_0 = pi`, right? So it is indeed in the vector V. – B. Decoster Jul 08 '11 at 09:01