I have a vector v
which has a lot of 0s and only a couple of non-zero items (mostly 1s). I have a matrix m
which is square, symmetric and has integer values.
Test script
The following script generates similar data and times the part I'm interested in:
#!/usr/bin/env python
import time
import numpy as np
np.random.seed(0)
n = 20000
m = 20
assert n >= m
# Create the vector
v = [1] * m + [0] * (n - m)
np.random.shuffle(v)
v = np.array(v, np.int8)
# Create the matrix
m = np.random.randint(0, 256, size=(n, n), dtype=np.uint8)
m = (m + m.T) / 2
m = m.astype(np.uint8)
# Multiplication
t0 = time.time()
result = np.dot(v, m)
t1 = time.time()
# Check the results
print("result shape: {}".format(result.shape))
print("result[0]: {}".format(result[0])) # should be 1757
print('Time: {:0.2f}s'.format(t1 - t0))
Test results
I checked a couple of variations of the above script:
| Variation | Time |
| ------------------------------------------------- | ------ |
| Original | 21.65s |
| (1) m = m.astype(np.float32) | 0.09s |
| (2) v = v.astype(np.uint8) | 4.87s |
| (3) v = v.astype(np.int16);m = m.astype(np.int16) | 5.77s |
| (4) = (3) + matmul instead of dot | 6.91s |
| (5) = (1) + matmul instead of dot | 0.09s |
Question
Looking at my test results, I have two questions:
- Why is there such a huge difference? Is it type/boundary checks?
- Is there another way to speed it up; e.g. sparse matrices? (I tried to use
v = scipy.sparse.csr_matrix(v)
but I don't get the same result)