0

I am implementing NEAT (evolutionary algorithm for NN topology), and want to run feed forward network evaluations in parallel as it is the bottleneck during training. I am using the MLAgents library to connect with Unity where simulations are ran to evaluate fitness.

My problem is that when creating processes, the MLAgent environment connection is duplicated when the memory stack is copied to each process, which leads to the original connection to be closed. The connection object is completely irrelevant to the task performed during multiprocessing. How can i keep the connection object from being included in each of the subprocesses?

I have tried to separate the multiprocessing code into its own file as seen below, but it behaves the same.

from multiprocessing import Queue
import multiprocessing as mp


def get_action(network, obs, agent_num, queue):
    queue.put([agent_num, network.activate(obs)])

def get_actions(policies,
                fixed_policy,
                fixed_opponent,
                nn_input,
                decision_steps_blue,
                decision_steps_purple,
                agent_count,
                local_to_agent_map):
    # Concurrency things
    num_workers = mp.cpu_count()
    print("CPU Cores: " + str(num_workers))
    pool = mp.Pool(processes=num_workers)  # Problem: Unity connection (MLAgents) being duped
    q = Queue()

    for agent in range(agent_count):
        if local_to_agent_map[agent] in decision_steps_purple or local_to_agent_map[agent] in decision_steps_blue:
            if local_to_agent_map[agent] in decision_steps_blue or not fixed_opponent:
                policy = policies[agent]
            elif fixed_opponent:
                policy = fixed_policy
            pool.apply_async(get_action, args=(policy, nn_input[agent], agent, q))

    pool.close()
    pool.join()

    return q

The connection object is defined globally in the python script that calls the method above.

from mlagents_envs.environment import UnityEnvironment

env = UE(seed=1, side_channels=[])   # Object to avoid duplicating

Im sorry if this is a duplicate, i could not find any posts about excluding objects from memory of subprocesses, only how to share objects between processes which is not what i am looking for. I would greatly appreciate any help i could get!

Kristian T
  • 30
  • 4
  • 1
    define a `if __name__ == "main"` clause in your main module and put the `env =...` line under there. – Charchit Agarwal Mar 07 '23 at 10:46
  • Thank you, that solved the issue! Now i am facing the problem that it jumps from the point of joining all the processes in the pool directly back to the main function of where the first method call were done. – Kristian T Mar 07 '23 at 11:12
  • See if you can include more details about this issue in the question itself so it's clearer – Charchit Agarwal Mar 07 '23 at 12:07
  • 1
    I did some debugging, and its not entirely clear what is happening. It seems to crash with a SIGSEGV error so it would require some more digging. I think i have concluded that multiprocessing has way too much overhead for the actual calculation in this case. I have seen a 15 times increase in run time per forward pass phase. But thank you for the help anyway! – Kristian T Mar 07 '23 at 12:45
  • creating a worker process (on my system) is usually at least 50ms before any `import`s are accounted for. To successfully maximize performance, you must keep worker processes around for a long time to avoid paying start-up cost frequently. `Pool` also has a decent amount of overhead itself to perform its housekeeping tasks. Just using `mp.Process` can have a bit less. – Aaron Mar 07 '23 at 17:43

0 Answers0