0

Say I have an vector v(ij) and matrix M(jk).

v = np.array([1.0, 1.0, 0.0, 0.0], dtype=np.float32)

M = np.array([[0.0, 0.0, 0.2, 0.0],
              [0.0, 0.0, 0.0, -0.1],
              [0.0, 0.0, 0.0, 0.5],
              [0.0, 0.0, 0.0, 0.0]], dtype=np.float32)

How can I implement the following operation without for loop?

def matmul_in_place(vector, matrix):
    for i in range(vector.shape[0]):
        for j in range(matrix.shape[1]):
            vector[i] += vector[j] * matrix[j][i]

Result of matmul_in_place(v, M) should be [1.0, 1.0, 0.2, 0.0]

In place is not a requirement, instead, it means that when calculating vector[1], I should use updated vector[0] for vector[j] when j is 0. This also means that I need to calculate the vector from left to right.

Vector v can also be a matrix, but I want to keep the problem simple. As long as two matrices can be multiplied together(M1 @ M2), this operation should also work.

I've been looking into it for a while but I'm not familiar with einsum or custom cuda kernel so I don't even know if it's possible.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Gestalt
  • 11
  • 1
  • What are `v` and `S` in the function? `vector` and `matrix`? – mozway Jan 18 '22 at 17:04
  • @mozway Sorry for the confusion. I just updated them to vector and matrix. – Gestalt Jan 18 '22 at 17:08
  • At first sight, it looks like your algorithm is very iterative (i.e., the results of one cell depends on previous iterations) and order of the operations matters. I am not sure you can easily vectorize this. Can you give more context? What is it doing? – mozway Jan 18 '22 at 17:18
  • @mozway It's an representation of non-typical neural network. v is the state vector of all neurons and the matrix M is all synapses. Though I don't think background is that important, you can read more here: https://stats.stackexchange.com/q/459491/341008 – Gestalt Jan 18 '22 at 17:25
  • I don't think this can run in parallel cuz it has to go from left to right. But at least I should be able to Jit it for some acceleration right? – Gestalt Jan 18 '22 at 17:29
  • 1
    This looks like an iterative algorithm where loops can't be avoided. Using Numba `@nb.njit(fastmath=True)`, it should be at least 100 times faster on a larger example. If you use matrix in more than one calculation a real transpose which is only done once would be recommendable. `matrix=np.ascontiguousarray(matrix.T)` and changing the corresponding part in the code. Because of better memory access pattern you can gain another 50%. – max9111 Jan 19 '22 at 15:27

0 Answers0