Given a Scipy CSC Sparse matrix "sm" with dimensions (170k x 170k) with 440 million non-null points and a sparse CSC vector "v" (170k x 1) with a few non-null points, is there anything that can be done to improve the performance of the operation:
resul = sm.dot(v)
?
Currently it's taking roughly 1 second. Initializing the matrices as CSR increased the time up to 3 seconds, so CSC performed better.
SM is a matrix of similarities between products and V is the vector that represents which products the user bought or clicked on. So for every user sm is the same.
I'm using Ubuntu 13.04, Intel i3 @3.4GHz, 4 Cores.
Researching on SO I read about Ablas package. I typed into the terminal:
~$ ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so
Which resulted in:
linux-vdso.so.1 => (0x00007fff56a88000)
libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f888137f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8880fb7000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8880cb1000)
/lib64/ld-linux-x86-64.so.2 (0x00007f888183c000)
And for what I understood this means that I'm already using a high performance package from Ablas. I'm still not sure though if this package already implements parallel computing but it looks like it doesn't.
Could multi-core processing help to boost performance? If so, is there any library that could be helpful in python?
I was also considering the idea of implementing this in Cython but I don't know if this would lead to good results.
Thanks in advance.