0

I use CUSP conjugate gradient method to solve my symmetric sparse matrix. And I have no idea why it doesn't converge. Dimensions of matrices I use are not that large (1K to 100K). The same linear systems are easily solved by MKL, so the matrix is not ill-conditioned. However I tried adding preconditioner, but it gave no results:

diagonal preconditioner and AINV (incomplete Cholesky) gave unlimited growth of residual (as long as cg and bicgstab)

Here's my code:

cusp::csr_matrix <int, float, cusp::device_memory> A (n, n, nnz);

for (i = 0; i < n + 1; i++)
    A.row_offsets[i] = csrRowPtr[i] - 1;
for (i = 0; i < nnz; i++)
    A.values[i] = csrVal[i];
for (i = 0; i < nnz; i++)
    A.column_indices[i] = csrColInd[i] - 1;

cusp::array1d <float, cusp::device_memory> x (A.num_rows, 0);
cusp::array1d <float, cusp::device_memory> b (A.num_rows, 1);

for (i = 0; i < n; i++)
    b[i] = b_host[i];

cusp::verbose_monitor<float> monitor(b, 100, 1e-3);
cusp::identity_operator<float, MemorySpace> M(A.num_rows, A.num_rows);
    /*
    cusp::precond::diagonal<float, MemorySpace> M(A);
    cusp::precond::scaled_bridson_ainv<float, MemorySpace> M(A, .1);
    */
cusp::krylov::cg(A, x, b, monitor, M);

for (i = 0; i < n; i++)
    x_host[i] = x[i];

Why isn't it working correctly?

P.S. As I understand CUSP supposes zero-based index, that's why I decrease csrRowPtr and csrColInd. When I used nVidia cuSparse library there was an option to set other parameters like matrix type and fill mode. How can I be sure they are set correctly in CUSP?

Max
  • 441
  • 2
  • 7
  • 14
  • What method in MKL is successfully solving the system? – talonmies Feb 09 '14 at 15:35
  • Preconditioned CG. But I also solved these systems (under 40K) with a sample conjugate gradient from CUDA SDK. – Max Feb 09 '14 at 15:48
  • So using an incomplete LU factorization as a preconditioner and double precision conjugate gradient solver? – talonmies Feb 09 '14 at 16:00
  • Not exactly. I used CUDA conjugate gradient solver (without preconditioner) with single precision (I have 1.3 version CUDA device, it doesn't support double precision). MKL solution was obtained by my research advisor, but I think it uses Cholesky preconditioner. – Max Feb 09 '14 at 16:14
  • You have said that the same systems solve on Intel MKL, but not in CUSP using the code above. I am trying to get an accurate description of exactly what you are doing in MKL. Because (AFAIK) MKL is double precision and uses an ILU preconditioner. That is very different from your CUSP code, and it might be why the CUSP code fails. And, for what it is worth, a compute capability 1.3 device does support double precision. – talonmies Feb 09 '14 at 16:18
  • Really? I read somewhere that double precision is supported starting from 2.0. Surprise for me :) I'm not sure about MKL, because I only study CUDA, but the SDK sample of CG works correctly, that's why I wonder why this one doesn't work. Is code correct? Does CUSP suppose that there's only upper part of symmetric matrix stored in CSR? – Max Feb 09 '14 at 16:27
  • 2
    Are you sure that you are not doing anything wrong on the cusp side? Since you are using MKL and the CUDA SDK and both converge, I assume you know the solution. What does it happen if you use cusp's cg starting from the solution point you already have (I'm not a cusp user, so I don't know if this is possible)? Does cusp gets stuck at the starting point, which is what one should expect? This is a consistency test I typically do to check for bugs in my optimization algorithms. – Vitality Feb 09 '14 at 17:51
  • 2
    This code appears to be largely lifted from the [cusp sample code](https://github.com/cusplibrary/cusplibrary/blob/master/examples/Solvers/cg.cu). Is there some reason you can't provide a full compilable example code that shows the problem? Your suggestion seems to be that the cg solver fails on a variety of sample problems you try, so how about the synthetic case given in the cusp sample or the CG SDK sample? Do you have a typedef for MemorySpace somewhere? Please provide a complete sample like the cusp or SDK CG samples. – Robert Crovella Feb 09 '14 at 21:19
  • 1
    I suspect a problem in your `A` matrix assembly. Try using the CSR matrix verification available in `cusp/verify.h`. Here's a [complete worked example](http://pastebin.com/g3wBRNxn) showing the assembly of the `A` matrix using the routine from the cuda SDK sample. It seems to converge quickly for me. – Robert Crovella Feb 09 '14 at 23:59
  • Thanks for your comprehensive answer! I tried using simple matrices (Id and tridiagonal). It works correctly, but the program from pastebin.com gave me the same results. I think I was wrong when I said that SDK CG easily solves my problem (it needs about 400 iterations). It looks like preconditioner is needed. I tried CUSP bridson_ainv, but solution didn't converge (with standard parameters). Where can I read about choosing them for my problem? – Max Feb 10 '14 at 22:24
  • Each available [cusp preconditioner](https://cusplibrary.github.io/group__preconditioners.html) has a brief description of its application. Have you read those? Have you tried the diagonal preconditioner? – Robert Crovella Feb 11 '14 at 06:38
  • Yes, I have, the diagonal preconitioner didn't give significant improvement. I've read the discription, but it's too brief unfortunatelly. Usually we try to find approximate inverse matrix with the same pattern of non-zeros. So nonzero_per_row should be approximately equal to nnz/N, right? And how do we choose dropping parameters? – Max Feb 11 '14 at 14:08
  • A number of the preconditioners only work for SPD matrices. The bridson ainv method is described [here](http://www.cs.ubc.ca/~rbridson/docs/refining.pdf) in particular section 4, including discussions of drop tolerance, etc. – Robert Crovella Feb 11 '14 at 20:02

1 Answers1

1

Only elements from the upper triangle are stored in MKL's CSR format, but all the elements must be stored in CUSP's CSR format even if you are solving a symmetric linear system.

Also I think

for (i = 0; i < n; i++)
    x_host[i] = x[i];

is not a good idea; first transfer it back to host_memory

cusp::array1d<float, cusp::host_memory> _x = x;

then copy it back to x_host or whatever your result array is

for (i = 0; i < n; i++)
    x_host[i] = _x[i];
someguy
  • 11
  • 2