Stack overflow "-1073741571 (0xC00000FD)" when activating parallelization in Numba, no recursive functions

Question

I am working on Python 3.10, Pycharm IDE.

I have this quite long code with loops that I accelerated using numba. The code runs fine if I don't add the flag (parallel=True) to the decorator @njit, but if I do, the error Process finished with exit code -1073741571 (0xC00000FD) shows up in console after a short while.

I am working with long arrays (the project is about the dynamics of a population of 1000 nodes or more, so I have quite some 1D numpy arrays with that length).

The main part of the program is a big for loop, which is repeated K iterations. There are other for loops nested in this main loop. The program runs several experiments with the same parameters. I have tried reducing the population from N=1000 to N=100: when I do this, if the experiment converges fast (not a lot of iterations of the main for loop) then it runs fine, but if one experiment doesn't converge fast, then it crashes to the mentioned error.

There are no recursive functions in the code.

The code, among many np arrays, uses one numba typed list of 1D np arrays of different lengths, just in case this might be the issue (ideally I would convert it to np array of np arrays, but then you need it to be dtype=object and numba doesn't take that).

I have tried increasing Pycharm's maximum heap memory size, it did not work.

I am aware that I am being quite unexplicit about the code or what it does, I apologize. Please do ask for any specific information about it that could help.

Hope someone can give me a hand!

EDIT: I just realized that for a specific set of parameters (N = 1000), it always crashes during the 31st iteration. For N=500, it crashes at the 62nd iteration.

EDIT 2: This is the code of the function that iterates. I've seen that it often ends up crashing during a call to the check_punishers function.

@njit(parallel=True)
def iterate(neighbors, degree, punishers, pnetwork, strategies, cost, cycles, r, size, fermi_temp, mu, defect_cost, punishment_cost):

    convergence = 0

    cooperators = np.zeros(cycles)

    for iteration in range(cycles):                 # Cycles = steps of simulation

        total_coop = strategies.size - np.sum(strategies)
        cooperators[iteration] = total_coop

        # UPDATING PAYOFFS (FITNESS):

        fitness = np.zeros(size)

        for node in range(size): # For each PGG
            neighbors_strat_inv = 1-strategies[np.asarray(neighbors[node])]     # Array with 1s for coop neigbors and 0s for defecting neighbors

            total_cost = 0
            for i in range(degree[node]):                                              # We do it in a for loop instead of np.dot because numba doesnt support dot product for integers
                total_cost += cost[neighbors[node]][i]*neighbors_strat_inv[i]          # The total prize pool is the sum of each neighbors cost*inv_strat + own node's
            total_cost += cost[node]*(1-strategies[node])
            # Payoff for the defectors (payoff for the cooperators will be this minus their cost)
            payoff_defect = total_cost*r/(degree[node]+1)

            if strategies[node] == 1:
                fitness[node] += payoff_defect
                for pun in check_punishers(node, node, neighbors, strategies, punishers, pnetwork):
                    fitness[node] += - defect_cost
                    fitness[pun] += - defect_cost * punishment_cost
            elif strategies[node] == 0:
                fitness[node] += payoff_defect - cost[node]


            for neighbor in neighbors[node]:
                if strategies[neighbor] == 0:
                    fitness[neighbor] += payoff_defect - cost[neighbor]
                if strategies[neighbor] == 1:
                    fitness[neighbor] += payoff_defect
                    for pun in check_punishers(neighbor, node, neighbors, strategies, punishers, pnetwork):
                        fitness[neighbor] += - defect_cost
                        fitness[pun] += - defect_cost * punishment_cost


                    # CHECK CONVERGENCE
        if total_coop == 0:
            convergence_step = iteration
            break
        if convergence >= 500:
            convergence_step = iteration
            break
        if total_coop > size * 0.96 :
            convergence += 1
        if total_coop <= size * 0.96 and convergence > 0:
            convergence = 0

        #UPDATING STRATEGIES

        for i in range(size):                                            # For each node, we check neighbors and update strategy
            node = np.random.randint(0, size)
            chosen_neighbor = np.random.randint(0, degree[node])            # Choose a neighbor at random
            chosen_node = neighbors[node][chosen_neighbor]

            seed = np.random.rand()                                        # Mutation?
            if seed < mu:
                seed2 = np.random.randint(0,2)
                if seed2 == 0:                                             # 33% it mutates into a Defector
                    strategies[node] = 1
                    punishers[node] = 0
                elif seed2 == 1:                                           # 33% it mutates into a Cooperator
                    strategies[node] = 0
                    punishers[node] = 0
                elif seed2 == 2:                                           # 33% it mutates into a Punisher
                    strategies[node] = 0
                    punishers[node] = 1

            # IF THERE IS NO MUTATION, THEN IMITATE
            else:
                if strategies[chosen_node] != strategies[node]:         # If strategies are different
                    probability = 1 / (1 + np.exp(fermi_temp * (fitness[node] - fitness[chosen_node])))
                    #if Old_fitness[chosen_node] > Old_fitness[node]:            # If random node is fitter
                        #probability = (Old_fitness[chosen_node] - Old_fitness[node])/(np.amax(Old_fitness[neighbors[node]])-Old_fitness[node])
                    seed = np.random.rand()
                    if seed < probability:
                        strategies[node] = strategies[chosen_node]
                        punishers[node] = punishers[chosen_node]

                elif strategies[chosen_node] == strategies[node] and punishers[chosen_node] != punishers[node]:             # If C vs P:
                    probability = 1 / (1 + np.exp(fermi_temp * (fitness[node] - fitness[chosen_node])))
                    seed = np.random.rand()
                    if seed < probability:
                        punishers[node] = punishers[chosen_node]

    convergence_step = iteration

    return (cooperators, strategies, fitness, convergence_step)

@njit(parallel=True)
def check_punishers (node, center_node, neighbors, strategies, punishers, pnetwork):
    node_punishers = List()
    for neighbor in np.append(np.intersect1d(neighbors[node], neighbors[center_node]), center_node):                        
        if strategies[neighbor] == 0 and punishers[neighbor] == 1 and node in pnetwork[neighbor]:
            node_punishers.append(neighbor)

    return node_punishers

@GarrettHyde just added the code for the function that iterates. The program crashes after a specific number of iterations (depending on how big I make my N parameter), and it usually happens while calling the function check_punishers. As I said, though, if the N is small and the experiment turns out to converge fast (below the number of iterations that I specified in EDIT 1), it just works. — R. Macaya, Mar 06 '23 at 16:21
I do not see any parallel operation in the code. Numba could theoretically parallelize operation on large array but it does not seems it would provide a significant speed up at first glance here. You should not expect Numba to parallelize the loop automatically (this is very hard to do and it generally produce inefficient parallel codes anyway). Still, this certainly means there is an issue in the Numba implementation. Can you check there is no out of bounds using `boundscheck=True` and `debug=True` ? Assuming the code does not have any undefined behaviour it should be fine. — Jérôme Richard, Mar 06 '23 at 17:26
@JérômeRichard Thanks! Yes, I probably was expecting numba to do some magic and now I understand it just doesn't work like that. I'll try to optimize the code in some other way (if you have any tips now that you saw it, they would be very welcome!). I ran it with those 2 flags, the Debug one gave me this error 4 times: Code\lib\site-packages\numba\core\lowering.py:107: ```NumbaDebugInfoWarning: Could not find source for function: . Debug line information may be inaccurate. warnings.warn(NumbaDebugInfoWarning(msg))``` — R. Macaya, Mar 06 '23 at 17:39
This is unexpected. It looks like the debug flags does not like the parallelisation of some function (this is an issue of Numba). It also means some functions are parallelized but as said before, it may not be faster. I advise you to use `prange` for the parallelization (assuming the code could be parallelized) instead of using the automatic parallelisation of vectorized functions because doing fork-join many times is expensive. Note `prange` do not check if the parallelization is valid so the result could be wrong or crash if it is not the case. — Jérôme Richard, Mar 06 '23 at 17:52
Anyway, the debug flag is not the most important here. If the bound-checking do not see any issue, then I advise you to fill a bug on the Numba's GitHub repository. Besides, you can use `fastmath=True` if you are sure that the code do not deal with special values like NaN or Inf or subnormal values and the FP associativity is not critical for the numerical stability of the code. The `check_punishers` function can certainly be optimized starting from not using `np.append` which create a new temporary array. The list can be replaced by a preallocated array with a maximum size. — Jérôme Richard, Mar 06 '23 at 17:58

Stack overflow "-1073741571 (0xC00000FD)" when activating parallelization in Numba, no recursive functions

0 Answers0