0

I have a class that subclasses a Thread, and I'm having trouble creating objects with mySQL connections inside it asynchronously. Each MyObject has a mysql connection pool that is created on __init__. I'm running into an issue where creating MyObject causes blocking, despite supposedly being asynchronous. Is there something in mysql-connector that blocks concurrent execution, and how can I work around this?

Before adding threading, here's what MyObject creation looked like:

self.objects = [MyObject(*o, self) for o in objectData]

But delays would accumulate as more instances of MyObject were created

So I tried to creating MyObject with threads:

        self.objects = []
        threads = [
            threading.Thread(
                target=lambda o: self.objects.append(MyObject(*o, self)), 
                args=(o,)
            ) for o in objectData
        ]
        
        for t in threads:
            t.start()
        for t in threads:
            t.join()

Even though object instances are created immediately, it still takes an increasing amount of time to create a connection pool in MyObject. Is there something in mysql-connector that blocks concurrent execution, and how can I work around this?

Edit: This is the code for MyObject:

class MyObject(Thread):
    def __init__(self, *args, manager):
        self.parent = manager

        self.cnxPool = mysql.connector.pooling.MySQLConnectionPool(
            pool_name=f"{get_ident()}",
            pool_size=10,
            host=HOST,
            user=USER,
            password=PASSWORD,
            port=25060
        )

        self.stopEvent = Event()
        Thread.__init__(self)
        self.daemon = True

    def run(self):
        ...

objectData is loaded from json, there are a variable amount of args

knho
  • 19
  • 1
  • 3
  • Can you post more code? Why are you doing this `MyObject(*o, self)` ? Where is `objectData` defined? – Alex W Jun 13 '23 at 19:08
  • Added more code to the original question - is this necessary for the issue, though? – knho Jun 13 '23 at 19:20
  • 1
    Yes, it's very helpful for context, if each object is creating a new connection pool you are likely going to exhaust the maximum connections to the database quickly. – Alex W Jun 13 '23 at 19:24
  • Our database supports up to 1600 connections and with what we're running (plus existing services interacting with it) it is impossible for us to be exceeding 1000 connections. I've verified this in the logs. Can confirm this isn't the issue. – knho Jun 13 '23 at 19:28
  • 1
    You say that creation of MyObject causes blocking, but you are calling the MyObject constructor from the main thread. Secondary threads begin at the first line of the `run` method, not their constructor. When you call Thread.start, *then* the OS actually creates the new thread of execution. Thread.start then calls `run`, at which point your code begins. So in the code you presented, object creation is *not* asynchronous. – Paul Cornelius Jun 14 '23 at 10:25

1 Answers1

0

(1) You should avoid subclassing threading.Thread itself - you want to run your code in a new thread, not that your code be a thread. Use the target argument instead.

As correctly verified in the comments by @Paul Cornelius, you are sub-classing it incorrectly: it is the run method in the thread class that executes in the new thread - its __init__ method is executed in the same thread as the initial thread.

(2) creating a new pool connection in each thread does sound strange. If you need that much connections, you should really just create a pool with enough size in the main thread (or in parallel in a secondary thread), and have all your worker threads use the same pool. Unless you've already benchmarked things around, and are very confident you are better of with each thread managing its own small pool (again, this sounds strange, because typically each thread will just use one connection at a time, unless you are running an async loop in each thread for a second-level of concurrency)


def runner(*args, pool, manager):
   """a simple callable that works as target for each new thread.
      No need to subclass threading.Thread
   """
   obj = MyObject(*args, pool=pool, manager=manager)
   obj.run()


class MyObject:
    def __init__(self, *args, pool, manager):
        self.parent = manager

        self.cnxPool = pool


        self.stopEvent = Event()
        Thread.__init__(self)
        self.daemon = True

    def run(self):
        ...

# Now, we don't have the code you use to start the threads, 
# so I can give just a general example. PLEASE, _do_ provide
# a minimal, repeatable, self contained, _complete_ example
# when asking questions


def main():
    n_threads = 100 # whatever
    ...
    pool = mysql.connector.pooling.MySQLConnectionPool(
            pool_name=f"{get_ident()}",
            pool_size=10 * n_threads,
            host=HOST,
            user=USER,
            password=PASSWORD,
            port=25060
        )
    manager = ...
    args = ...
    ...
    workers = []
    for i in range(n_workers):
         args = ...

         worker = threading.Thread(target=runner, args=args, 
               kwargs={"pool": pool, "manager": manager})
         workers.append(worker)
         worker.start()
   ...

main()

This way, each "runner" instance is already called in another thread, unlike the __init__ method of a subclass of Thread. Also, everything points to you creating 10X more connections than are actually used in your orignal code - I kept the number here in "10 * n_threads" , but you will likely be good with one connection per thread, unless there is another concurrency level in your code.

jsbueno
  • 99,910
  • 10
  • 151
  • 209