3

Is it possible to vectorize the following code in Python? It runs very slowly when the size of the array becomes large.

import numpy as np

# A, B, C are 3d arrays with shape (K, N, N). 
# Entries in A, B, and C are in [0, 1]. 
# In the following, I use random values in B and C as an example.

K = 5
N = 10000
A = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))

for k in range(K):
    for m in [x for x in range(K) if x != k]:
        for i in range(N):
            for j in range(N):
                if A[m, i, j] not in [0, 1]:
                    if A[k, i, j] == 1:
                        A[m, i, j] = B[m ,i ,j]
                    if A[k ,i, j] == 0:
                        A[m, i, j] = C[m, i, j]

Jayyu
  • 319
  • 2
  • 10
  • In the actual problem, A has values that are not in [0,1]. – Jayyu Jul 05 '22 at 15:53
  • 1
    The solution by @frederick-douglas-pearce vectorizes the inner loops over i and j, which speed up the code a lot because in this case K is small while N is large. I am wondering if it is possible to improve the efficiency even more by vectorizing the outer loops over m and/or k. – Jayyu Jul 05 '22 at 15:58
  • 1
    The double loops over k and m can be reduced to one loop by creating a list of pairs between k and m, then iterating over each pair of k and m. This can save a bit of time when K becomes larger. – Jayyu Jul 06 '22 at 18:22
  • Precomputing all combinations will certainly work, but probably not too much performance improvement there. It is possible to vectorize the `m` loop, so you just do the one loop over `k`; however, there is always a tradeoff with memory consumption. I'll update my answer below to show what I mean – frederick-douglas-pearce Jul 06 '22 at 19:56

2 Answers2

3

I cannot identify a way to vectorize this, but I can suggest using numba package to reduce the computation time. At here, you can import njit with the nogil=True parameter to speed up your code.

from numba import njit

@njit(nogil=True)
def function():
    for k in range(K):
        for m in [x for x in range(K) if x != k]:
            for i in range(N):
                for j in range(N):
                    if A[k, i, j] == 1 and A[m, i, j] not in [0, 1]:
                        A[m, i, j] = B[m ,i ,j]
                    if A[k ,i, j] == 0 and A[m, i, j] not in [0, 1]:
                        A[m, i, j] = C[m, i, j]

%timeit function()
7.35 s ± 252 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With njit and nogil parameter, it took me 7 seconds to run the whole thing, but without the njit, my code is running for hours(and it still is now). Python has a global interpreter lock (GIL) to make sure it sticks to single-threading. By releasing the GIL, you can execute the code in multithreading. However, when using nogil=True, you’ll have to be wary of the usual pitfalls of multi-threaded programming (consistency, synchronization, race conditions, etc.).

You can look at the documentation about Numba here. https://numba.pydata.org/numba-doc/dev/user/jit.html?highlight=nogil

  • 1
    Small nitpick: technically, Numba doesn't do multiprocessing, it actually does threading, and that's if you explicitly enable it via the `parallel=True` flag to `njit`, and then turn one of those `range`s into `numba.prange`. I don't think `nogil` should do anything on its own here. – Dominik Stańczak Jul 05 '22 at 07:14
  • Thanks for the deliberate check on my explanation, @DominikStańczak! Well, it's true that `parallel` is the main contribution to multithreading. Perhaps I have misunderstood the notes they gave on the website :D This is what I found in their website: `Code running with the GIL released runs concurrently with other threads executing Python or Numba code (either the same compiled function, or another one), allowing you to take advantage of multi-core systems. This will not be possible if the function is compiled in object mode.` –  Jul 05 '22 at 07:21
  • 1
    Yeah, but as I understand it (which may be off! which is why this I find this interesting :D ), that means you can compile a func with `nogil`, run a bunch of classic Python threads of the kind that's usually hindered by the GIL and run your compiled function in them, and then they won't interfere with each other via GIL. But that's a large assumption here (we have no idea if @Jayz wants to run this in that way) and what's probably more useful here is `parallel`. – Dominik Stańczak Jul 05 '22 at 07:39
1

I can help with a partial vectorization that should speed things up quite a bit, but I'm not sure on your logic for k vs. m, so didn't try to include that part. Essentially, you create a mask with the conditions you want checked across the 2nd and 3rd dimensions of A. Then map between A and either B or C using the appropriate mask:

# A, B, C are 3d arrays with shape (K, N, N). 
# Entries in A, B, and C are in [0, 1]. 
# In the following, I use random values in B and C as an example.

np.random.seed(10)

K = 5
N = 1000
A = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))

for k in range(K):
    for m in [x for x in range(K) if x != k]:
        #if A[m, i, j] not in [0, 1]:
        mask_1 = A[k, :, :] == 1
        mask_0 = A[k, :, :] == 0
        A[m, mask_1] = B[m, mask_1]
        A[m, mask_0] = C[m, mask_0]

I omitted the A[m, i, j] not in [0, 1] part because this made it difficult to debug since nothing happens (A is initialized as all zeros). If you need to include additional logic like this, just create another mask for it and include it in with an and in each mask's logic.

Update on 7/6/22 If you want to update the above code to remove the loop over m, then you can initialize an array with all the values of k, and use that to expand the mask to include all 3 dimensions, excluding each value of k that matches m as follows:

np.random.seed(10)

K = 5
N = 1000
A_2 = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))
K_vals = np.array(range(K))

for k in range(K):
    #for m in [x for x in range(K) if x != k]:
        #if A[m, i, j] not in [0, 1]:
    k_dim_2_skip = K_vals == k
    mask_1 = np.tile(A_2[k, :, :] == 1, (K, 1, 1))
    mask_1[k_dim_2_skip, :, :] = False
    mask_0 = np.tile(A_2[k, :, :] == 0, (K, 1, 1))
    mask_0[k_dim_2_skip, :, :] = False
    A_2[mask_1] = B[mask_1]
    A_2[mask_0] = C[mask_0]

Use these masks with the & np.logical_not... code you added in the comment below and that should do it. Note the more you vectorize, the larger the arrays you're manipulating for masks, etc. get, so there is a tradeoff with memory consumption. There is usually a sweet spot to balance run time vs memory usage for a given problem.

  • 2
    Mask 1 after including "If A[m ,i , j] not in [0, 1]" can be the following: ```mask_1 = A[k, :, :] == 1 & np.logical_not(np.isin(A[m, :, :], [0, 1])) ``` – Jayyu Jul 05 '22 at 15:09