I have an app where the following happens:
- starts a thread to generate "work"
- this thread then starts a thread pool with 5 workers to generate "work" and put it on to a FIFO queue
- starts a thread pool of 20 workers to get work from the FIFO queue and executes it on a thread in the pool
When running just one piece of "work" through the system, it works great. When running multiple, it starts failing.
I logged out in the id()
of the objects retrieved from the queue, it seems that memory addresses are being re-used repeatedly for some reason rather than storing objects in a new memory address. I suspect then there is a data race where multiple threads are then accessing an object (which in my view IS a different object) but from the same memory address thereby overwriting each others variables etc.
See the following snippet from the log:
[2023-02-16 14:33:02,695] INFO | App started with main PID: 26600
[2023-02-16 14:33:02,695] DEBUG | Max workers: 20
[2023-02-16 14:33:02,695] DEBUG | Max queue size: 60
[2023-02-16 14:33:02,695] INFO | Creating a work queue with size: 60
[2023-02-16 14:33:02,695] INFO | Starting the work generator thread
[2023-02-16 14:33:02,696] INFO | Creating a work consumer thread pool with max workers: 20
[2023-02-16 14:33:02,697] INFO | Found automation 'automation_d'
[2023-02-16 14:33:02,697] DEBUG | Submitting automation file to the work generator thread pool for execution
>>>>>>>>>>>>>>>>>>>id()==140299908643808
[2023-02-16 14:33:03,181] DEBUG | Putting 'T2149393' on to the queue for automation 'automation_d'
[2023-02-16 14:33:03,181] DEBUG | Putting 'T2149388' on to the queue for automation 'automation_d'
[2023-02-16 14:33:03,181] DEBUG | Putting 'T2149389' on to the queue for automation 'automation_d'
[2023-02-16 14:33:03,198] DEBUG | Retrieved a work item from the queue
[2023-02-16 14:33:03,198] DEBUG | Submitting work to the work consumer thread pool for execution
[2023-02-16 14:33:03,199] DEBUG | ==========================================================================================
>>>>>>>>>>>>>>>>>>>id()==140299908643808
[2023-02-16 14:33:03,199] DEBUG | <automation.TAutomation object at 0x7f9a1e377be0>
[2023-02-16 14:33:03,199] DEBUG | Task(num="T2149393", req="R2396580", who="", grp="AG1", desc="REQ - T"
[2023-02-16 14:33:03,199] DEBUG | ==========================================================================================
[2023-02-16 14:33:03,199] INFO | Running automation_d against T2149393 with internal automation id 18aa2e51-c94d-4d83-a033-44e30cca9dd3 in thread 140299891414784
[2023-02-16 14:33:03,199] INFO | Assigning T2149393 to API user
[2023-02-16 14:33:03,199] DEBUG | Retrieved a work item from the queue
[2023-02-16 14:33:03,201] DEBUG | Submitting work to the work consumer thread pool for execution
[2023-02-16 14:33:03,202] DEBUG | ==========================================================================================
>>>>>>>>>>>>>>>>>>>id()==140299908643808
[2023-02-16 14:33:03,202] DEBUG | <automation.TAutomation object at 0x7f9a1e377be0>
[2023-02-16 14:33:03,202] DEBUG | Task(num="T2149388", req="R2396575", who="", grp="AG1", desc="REQ - T"
[2023-02-16 14:33:03,202] DEBUG | ==========================================================================================
[2023-02-16 14:33:03,202] INFO | Running automation_d against T2149388 with internal automation id 18aa2e51-c94d-4d83-a033-44e30cca9dd3 in thread 140299883022080
[2023-02-16 14:33:03,202] DEBUG | Retrieved a work item from the queue
[2023-02-16 14:33:03,202] INFO | Assigning T2149388 to API user
[2023-02-16 14:33:03,203] DEBUG | Submitting work to the work consumer thread pool for execution
[2023-02-16 14:33:03,204] DEBUG | ==========================================================================================
>>>>>>>>>>>>>>>>>>>id()==140299908643808
[2023-02-16 14:33:03,204] DEBUG | <automation.TAutomation object at 0x7f9a1e377be0>
[2023-02-16 14:33:03,204] DEBUG | Task(num="T2149389", req="R2396576", who="", grp="AG1", desc="REQ - T"
[2023-02-16 14:33:03,205] DEBUG | ==========================================================================================
[2023-02-16 14:33:03,205] INFO | Running automation_d against T2149389 with internal automation id 18aa2e51-c94d-4d83-a033-44e30cca9dd3 in thread 140299670124288
As can be seen above, the id()
is the same for all executions. Also the actual memory address of the object is the same each time, as well as the internal automation id which is a attribute on the object. Meaning when I eventually put this in to the queue, and it gets consumed and passed to another thread for execution, every thread has a pointer/reference to the same object which is causing the execution to fail in weird ways.
The below code sample is not intended to be a re-producible way to generate the error or the above log, it's intended as a visualisation and to give an example of how the app is structured currently. There is way too much code and custom logic to share here.
Rough, high-level code here:
import json
import os
import sys
import time
from concurrent.futures import (CancelledError, Future, ThreadPoolExecutor,
TimeoutError)
from dataclasses import dataclass
from logging import Logger
from pathlib import Path, PurePath
from queue import Empty, Full, Queue
from threading import Event, Thread
from types import FrameType
from typing import Any, Dict, List, Optional
import requests
import urllib3
@dataclass()
class WorkItem:
automation_object: Automation
target: AutomationTarget
config: AutomationConfig
def generate_work(work_queue, app_config, automation_file, automation_name):
automation_config_raw = load_automation_file(automation_file)
validate_automation_file(automation_config=automation_config_raw)
automation_config = build_automation_config(
automation_name=automation_name,
automation_config_raw=automation_config_raw,
log_dir=app_config.log_dir
)
automation_object = build_automation(automation_config=automation_config)
records = automation_object.get_records()
for record in records:
work_item = WorkItem(
automation_object=automation_object,
target=record,
config=automation_config
)
work_queue.put(item=work_item, block=False)
def work_generator(stop_app_event, app_config, app_logger, work_queue):
work_generator_thread_pool = ThreadPoolExecutor(max_workers=5)
while True:
automation_files = get_automation_files(app_config.automations_dir)
for automation_file in automation_files:
automation_name = PurePath(automation_file).stem
work_generator_thread_pool.submit(generate_work, work_queue, app_config, automation_file, automation_name)
def main():
work_generator_thread = Thread(target=work_generator, args=(stop_app_event, app_config, app_logger, work_queue))
work_generator_thread.start()
work_consumer_thread_pool = ThreadPoolExecutor(max_workers=max_workers)
while True:
work_item = work_queue.get()
work_consumer_thread_pool.submit(work_item.automation_object.execute, work_item.target)
if __name__ == "__main__":
main()
So, at a high level we have 1 thread generating work using a thread pool, and another thread consuming + executing work from the queue.
Why is Python re-using the same piece of memory repeatedly and how can I force it to use a new piece of memory when creating these objects?