uBLAS Slow Matrix-SparseVector Multiplication

Question

I'm converting some of my own vector algebra code to use the optimized boost uBLAS library. However, when I tried to do a SymmetricMatrix-SparseVector multiplication I found it to be about 4x slower than my own implementation. The vector size is usually around 0-500 and about 70-80% entries are zero.

Here is my code

void CRoutines::GetA(double a[], double vectorIn[], int sparseVectorIndexes[], int vectorLength, int sparseLength)
{
    compressed_vector<double> inVec (vectorLength, sparseLength);
    for(int i = 0; i < sparseLength; i++)
    {
        inVec(sparseVectorIndexes[i]) = vectorIn[sparseVectorIndexes[i]];
    }
    vector<double> test = prod(inVec, matrix);
        for(int i = 0; i < vectorLength; i++)
    {
        a[i] = test(i);
    }
}

sparseVectorIndexes stores the indexes of the non-zero values of the input vector, vectorLength is the length of the vector, and sparseLength is the number of non-zeros in the vector. The matrix is stored as a symmetric matrix symmetric_matrix<double, lower>.

My own implementation is a simple nested loop iteration where matrix is just a 2D double array:

void CRoutines::GetA(double a[], double vectorIn[], int sparseVectorIndexes[], int vectorLength, int sparseLength)
 {
    for (int i = 0; i < vectorLength; i++)
    {
            double temp = 0;

            for (int j = 0; j < sparseLength; j++)
            {
                int row = sparseVectorIndexes[j];
                if (row <= i) // Handle lower triangular sparseness
                    temp += matrix[i][row] * vectorIn[row];
                else
                    temp += matrix[row][i] * vectorIn[row];
            }
            a[i] = temp;
    }

}

Why is uBLAS 4x slower? Am I not writing the multiplication properly? Or is there another library more suited to this?

EDIT: If I use a dense vector array instead then uBLAS is only 2x slower...

If this is in Visual Studio, did you check if you're compiling it under Debug mode? — Jacob, Jun 13 '11 at 13:34
Definitely compiling to Release, optimisations all on, and not testing within the IDE. — Projectile Fish, Jun 13 '11 at 13:40
Please post extended code - where does `vectorIn` come from, what's its type? What object copies are created in the second, non-uBlas code? Please post all the the code that you are measuring to come up with the 4x slowdown number. — Steve Townsend, Jun 13 '11 at 13:47
OK, I posted the extra code. Some extra info: this code is compiled to a dll and called in C#, but I don't think that should make any difference at all. — Projectile Fish, Jun 13 '11 at 14:02

score 2 · Accepted Answer · answered Jun 13 '11 at 14:33

2

uBlas was not designed with performance as goal No 1 in mind. There are libraries which are significantly faster than uBlas. See e.g. http://eigen.tuxfamily.org/index.php?title=Benchmark

answered Jun 13 '11 at 14:33

quant_dev

6,181
1
34
57

Wow. This could be the reason why. I was under the impression uBLAS was the fastest, not sure where I picked that up from. Will give eigen a try later. – Projectile Fish Jun 13 '11 at 14:46
3

@Projectile : Boost.uBLAS can serve as a mere front-end for LAPACK, UMFPACK, MUMPS, etc., increasing its performance by orders of magnitude without changing any code. See [this page](http://mathema.tician.de/node/391) for more info. – ildjarn Jun 13 '11 at 16:57

score 1 · Answer 2 · edited Apr 13 '17 at 12:53

1

This pdf has quite a detailed comparison of various linear algebra libraries. I came across this in this answer from Computational Science Stack Exchange, which is possibly a better place for this sort of question.

edited Apr 13 '17 at 12:53

Community

1
1

answered Mar 22 '12 at 11:21

mkm

673
5
21

score 0 · Answer 3 · edited Jun 13 '11 at 14:24

0

Not sure if it is the cause of the slowdown (did you profile to get your 4x number?) but this loop could be slow:

for(int i = 0; i < vectorLength; i++)
    {
        a[i] = test(i);
    }

If most of the time is spent processing the loops in your code then this extra loop could double the time (and have nothing to do with ublas). I would recommend using std::copy instead:

std::copy(test.begin(), test.end(), a[0])

Most compilers should see that this is copying a double and do an optimal copy, which might fix your problem somewhat.

edited Jun 13 '11 at 14:24

Jacob

34,255
14
110
165

answered Jun 13 '11 at 14:16

Tom

5,219
2
29
45

Thanks, but i'm pretty sure its the actual prod multiplication that's slow. If I simply remove that last loop from the code there is almost no difference in performance. I did profile to get that 4x number. – Projectile Fish Jun 13 '11 at 14:19

uBLAS Slow Matrix-SparseVector Multiplication

3 Answers3