Python threading memory error / bug / race condition

Question

I have an app where the following happens:

starts a thread to generate "work"
- this thread then starts a thread pool with 5 workers to generate "work" and put it on to a FIFO queue
starts a thread pool of 20 workers to get work from the FIFO queue and executes it on a thread in the pool

When running just one piece of "work" through the system, it works great. When running multiple, it starts failing.

I logged out in the id() of the objects retrieved from the queue, it seems that memory addresses are being re-used repeatedly for some reason rather than storing objects in a new memory address. I suspect then there is a data race where multiple threads are then accessing an object (which in my view IS a different object) but from the same memory address thereby overwriting each others variables etc.

See the following snippet from the log:

[2023-02-16 14:33:02,695] INFO     | App started with main PID: 26600
[2023-02-16 14:33:02,695] DEBUG    | Max workers: 20
[2023-02-16 14:33:02,695] DEBUG    | Max queue size: 60
[2023-02-16 14:33:02,695] INFO     | Creating a work queue with size: 60
[2023-02-16 14:33:02,695] INFO     | Starting the work generator thread
[2023-02-16 14:33:02,696] INFO     | Creating a work consumer thread pool with max workers: 20
[2023-02-16 14:33:02,697] INFO     | Found automation 'automation_d'
[2023-02-16 14:33:02,697] DEBUG    | Submitting automation file to the work generator thread pool for execution
>>>>>>>>>>>>>>>>>>>id()==140299908643808
[2023-02-16 14:33:03,181] DEBUG    | Putting 'T2149393' on to the queue for automation 'automation_d'
[2023-02-16 14:33:03,181] DEBUG    | Putting 'T2149388' on to the queue for automation 'automation_d'
[2023-02-16 14:33:03,181] DEBUG    | Putting 'T2149389' on to the queue for automation 'automation_d'
[2023-02-16 14:33:03,198] DEBUG    | Retrieved a work item from the queue
[2023-02-16 14:33:03,198] DEBUG    | Submitting work to the work consumer thread pool for execution
[2023-02-16 14:33:03,199] DEBUG    | ==========================================================================================
>>>>>>>>>>>>>>>>>>>id()==140299908643808
[2023-02-16 14:33:03,199] DEBUG    | <automation.TAutomation object at 0x7f9a1e377be0>
[2023-02-16 14:33:03,199] DEBUG    | Task(num="T2149393", req="R2396580", who="", grp="AG1", desc="REQ - T"
[2023-02-16 14:33:03,199] DEBUG    | ==========================================================================================
[2023-02-16 14:33:03,199] INFO     | Running automation_d against T2149393 with internal automation id 18aa2e51-c94d-4d83-a033-44e30cca9dd3 in thread 140299891414784
[2023-02-16 14:33:03,199] INFO     | Assigning T2149393 to API user
[2023-02-16 14:33:03,199] DEBUG    | Retrieved a work item from the queue
[2023-02-16 14:33:03,201] DEBUG    | Submitting work to the work consumer thread pool for execution
[2023-02-16 14:33:03,202] DEBUG    | ==========================================================================================
>>>>>>>>>>>>>>>>>>>id()==140299908643808
[2023-02-16 14:33:03,202] DEBUG    | <automation.TAutomation object at 0x7f9a1e377be0>
[2023-02-16 14:33:03,202] DEBUG    | Task(num="T2149388", req="R2396575", who="", grp="AG1", desc="REQ - T"
[2023-02-16 14:33:03,202] DEBUG    | ==========================================================================================
[2023-02-16 14:33:03,202] INFO     | Running automation_d against T2149388 with internal automation id 18aa2e51-c94d-4d83-a033-44e30cca9dd3 in thread 140299883022080
[2023-02-16 14:33:03,202] DEBUG    | Retrieved a work item from the queue
[2023-02-16 14:33:03,202] INFO     | Assigning T2149388 to API user
[2023-02-16 14:33:03,203] DEBUG    | Submitting work to the work consumer thread pool for execution
[2023-02-16 14:33:03,204] DEBUG    | ==========================================================================================
>>>>>>>>>>>>>>>>>>>id()==140299908643808
[2023-02-16 14:33:03,204] DEBUG    | <automation.TAutomation object at 0x7f9a1e377be0>
[2023-02-16 14:33:03,204] DEBUG    | Task(num="T2149389", req="R2396576", who="", grp="AG1", desc="REQ - T"
[2023-02-16 14:33:03,205] DEBUG    | ==========================================================================================
[2023-02-16 14:33:03,205] INFO     | Running automation_d against T2149389 with internal automation id 18aa2e51-c94d-4d83-a033-44e30cca9dd3 in thread 140299670124288

As can be seen above, the id() is the same for all executions. Also the actual memory address of the object is the same each time, as well as the internal automation id which is a attribute on the object. Meaning when I eventually put this in to the queue, and it gets consumed and passed to another thread for execution, every thread has a pointer/reference to the same object which is causing the execution to fail in weird ways.

The below code sample is not intended to be a re-producible way to generate the error or the above log, it's intended as a visualisation and to give an example of how the app is structured currently. There is way too much code and custom logic to share here.

Rough, high-level code here:

import json
import os
import sys
import time
from concurrent.futures import (CancelledError, Future, ThreadPoolExecutor,
                                TimeoutError)
from dataclasses import dataclass
from logging import Logger
from pathlib import Path, PurePath
from queue import Empty, Full, Queue
from threading import Event, Thread
from types import FrameType
from typing import Any, Dict, List, Optional

import requests
import urllib3

@dataclass()
class WorkItem:
    automation_object: Automation
    target: AutomationTarget
    config: AutomationConfig

def generate_work(work_queue, app_config, automation_file, automation_name):
    automation_config_raw = load_automation_file(automation_file)
    validate_automation_file(automation_config=automation_config_raw)
    automation_config = build_automation_config(
        automation_name=automation_name,
        automation_config_raw=automation_config_raw,
        log_dir=app_config.log_dir
    )
    automation_object = build_automation(automation_config=automation_config)
    records = automation_object.get_records()
    for record in records:
        work_item = WorkItem(
            automation_object=automation_object,
            target=record,
            config=automation_config
        )
        work_queue.put(item=work_item, block=False)

def work_generator(stop_app_event, app_config, app_logger, work_queue):
    work_generator_thread_pool = ThreadPoolExecutor(max_workers=5)
    while True:
        automation_files = get_automation_files(app_config.automations_dir)
        for automation_file in automation_files:
            automation_name = PurePath(automation_file).stem
            work_generator_thread_pool.submit(generate_work, work_queue, app_config, automation_file, automation_name)

def main():
    work_generator_thread = Thread(target=work_generator, args=(stop_app_event, app_config, app_logger, work_queue))
    work_generator_thread.start()
    
    work_consumer_thread_pool = ThreadPoolExecutor(max_workers=max_workers)
    while True:
        work_item = work_queue.get()
        work_consumer_thread_pool.submit(work_item.automation_object.execute, work_item.target)

if __name__ == "__main__":
    main()

So, at a high level we have 1 thread generating work using a thread pool, and another thread consuming + executing work from the queue.

Why is Python re-using the same piece of memory repeatedly and how can I force it to use a new piece of memory when creating these objects?

Please give a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). This is very difficult to debug if we can't run your code. — Oli, Feb 16 '23 at 17:33
I have some immediate questions though. Firstly, are you sure python threads are the right tool for this job? Multithreading code like this is often used for performance reasons, but I doubt you would get much of a speed increase in python due to the GIL (with the CPython interpreter anyway). Maybe you should be using multiprocessing instead? Though if IO like network calls is your bottleneck, then multitreading makes sense. — Oli, Feb 16 '23 at 17:38

Ahmed AEK · Answer 1 · 2023-02-16T21:01:37.400

Why is Python re-using the same piece of memory repeatedly and how can I force it to use a new piece of memory when creating these objects?

Cpython uses an arena allocator, which reuses memory for objects when they are no longer reachable, the fact that both objects used the same id means either

the first object was deleted as it was not accessible anywhere, and the second object reused the memory location.
you used the same object in both places, you didn't create a new object or make a copy of it.

since these objects have different data then the memory location is just being reused because it is no longer reachable, python will never reuse a memory location if it is reachable because the garbage collector and the allocator are threadsafe (by the GIL).

as for the reason your code doesn't work, it's likely because whatever "tasks" you are doing cannot run concurrently as they are sharing some hidden state that's not present in the code above.

Python threading memory error / bug / race condition

1 Answers1