0

I have a vector v which has a lot of 0s and only a couple of non-zero items (mostly 1s). I have a matrix m which is square, symmetric and has integer values.

Test script

The following script generates similar data and times the part I'm interested in:

#!/usr/bin/env python

import time
import numpy as np

np.random.seed(0)

n = 20000
m = 20
assert n >= m

# Create the vector
v = [1] * m + [0] * (n - m)
np.random.shuffle(v)
v = np.array(v, np.int8)

# Create the matrix
m = np.random.randint(0, 256, size=(n, n), dtype=np.uint8)
m = (m + m.T) / 2
m = m.astype(np.uint8)

# Multiplication
t0 = time.time()
result = np.dot(v, m)
t1 = time.time()

# Check the results
print("result shape: {}".format(result.shape))
print("result[0]: {}".format(result[0]))  # should be 1757
print('Time: {:0.2f}s'.format(t1 - t0))

Test results

I checked a couple of variations of the above script:

| Variation                                         | Time   |
| ------------------------------------------------- | ------ |
| Original                                          | 21.65s |
| (1) m = m.astype(np.float32)                      | 0.09s  |
| (2) v = v.astype(np.uint8)                        | 4.87s  |
| (3) v = v.astype(np.int16);m = m.astype(np.int16) | 5.77s  |
| (4) = (3) + matmul instead of dot                 | 6.91s  |
| (5) = (1) + matmul instead of dot                 | 0.09s  |

Question

Looking at my test results, I have two questions:

  1. Why is there such a huge difference? Is it type/boundary checks?
  2. Is there another way to speed it up; e.g. sparse matrices? (I tried to use v = scipy.sparse.csr_matrix(v) but I don't get the same result)
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 1
    I would suspect that `m = m.astype(np.float32)` is the only version that actually uses the BLAS/LAPACK library, and the others only the fallback. – Paul Brodersen Apr 10 '19 at 09:51
  • You could convert your data to `uint16` to prevent int overflow and then use [einsum](https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html): `result=np.einsum('ij,j', m, v)`. The reason for slow code could be the one mentioned by @PaulBrodersen, but it could also be that `dot` is checking for int overflow and promoting data accordingly, which takes time – Brenlla Apr 10 '19 at 10:54
  • 1
    btw, if the non-zero elements of `v` are all ones, you can simply index the rows of `m` and do a sum: `m[v.astype(bool)].sum(0)` – Brenlla Apr 10 '19 at 11:58
  • @Brenlla They are not all 0/1, but that is nice to know :-) – Martin Thoma Apr 10 '19 at 12:23
  • BLAS seems to be most important. After making sure I have libblas / installing the Ubuntu version instead of the pypi-version, I get way different numbers. See [What is the easiest way to install numpy with LAPACK/BLAS?](https://stackoverflow.com/q/26942148/562769). Now all numbers are below 0.5s – Martin Thoma Apr 10 '19 at 13:07

0 Answers0