6

In the project I am currently working on, I am not allowed to use an ORM so I made my own

It works great but I am having problems with Celery and it's concurrency. For a while, I had it set to 1 (using --concurrency=1) but I'm adding new tasks which take more time to process than they need to be run with celery beat, which causes a huge backlog of tasks.

When I set celery's concurrency to > 1, here's what happens (pastebin because it's big):

https://pastebin.com/M4HZXTDC

Any idea on how I could implement some kind of lock/wait on the other processes so that the different workers don't cross each other?

Edit: Here is where I setup my PyMySQL instance and how the open and close are handled

DejanLekic
  • 18,787
  • 4
  • 46
  • 77
juleslasne
  • 580
  • 3
  • 22
  • Well they should be doing that, but it seems that they do it at the same time, or there is something I didn't understand in the way I setup my PyMySQL – juleslasne Oct 23 '20 at 15:15
  • I see this in the code: `database_uri=f"sqlite:///{BACKEND_ROOT}/../chatbot_database.sqlite3",` – DejanLekic Oct 23 '20 at 15:23
  • Looks like the database connection is done only in the Flask app. I have no idea if Celery tasks can use it. Best would be to grab those connection details, and open DB session inside a task, do DML and close it. I am sure Flask users will know what the issue is... – DejanLekic Oct 23 '20 at 15:30
  • What you saw is linked to something else and not part of the error. This is a local sqlite database, and not used by flask but by celery only and not causing the error. The error is caused when multiple threads try to read/write to the DB at the same time – juleslasne Oct 23 '20 at 15:36
  • 1
    you have two workers and they are using the same network connection, and that will never work, so every instance need their own – nbk Oct 26 '20 at 13:00
  • In that case should I change my db connection setup to some kind of function/factory so that the workers have a separate one ? – juleslasne Oct 27 '20 at 13:03
  • @nbk So I tried creating my `pymysql.connect(**database_config)` in a function so the worker would have it's own but I still have the same error – juleslasne Oct 27 '20 at 13:09
  • try https://github.com/jkklee/pymysql-pool multithreading is a complex handling, with concurrency and their problems. also what you always need is semaphores https://docs.python.org/3/library/asyncio-sync.html to en capsule critical sections – nbk Oct 27 '20 at 13:32
  • Thank you for pymysql-pool, it's actually what I was trying before I saw your reply. I cannot get it to work right now but I'll keep trying. – juleslasne Oct 27 '20 at 13:35
  • @juleslasne Have you tried using sessions? https://docs.sqlalchemy.org/en/13/orm/contextual.html each time you need to close the session, seems like number of connections you are openineg is not being accepted by MySQL. – arshpreet Oct 30 '20 at 14:33
  • @arshpreet Sadly on this project I am not allowed to use an ORM, so I had to make my own... – juleslasne Nov 02 '20 at 09:45

1 Answers1

1

PyMSQL does not allow threads to share the same connection (the module can be shared, but threads cannot share a connection). Your Model class is reusing the same connection everywhere.

So, when different workers call on the models to do queries, they are using the same connection object, causing conflicts.

Make sure your connection objects are thread-local. Instead of having a db class attribute, consider a method that will retrieve a thread-local connection object, instead of reusing one potentially created in a different thread.

For instance, create your connection in the task.

Right now, you're using a global connection everywhere for every model.

# Connect to the database
connection = pymysql.connect(**database_config)


class Model(object):
    """
    Base Model class, all other Models will inherit from this
    """

    db = connection

To avoid this you can create the DB in the __init__ method instead...

class Model(object):
    """
    Base Model class, all other Models will inherit from this
    """

    def __init__(self, *args, **kwargs):
        self.db = pymysql.connect(**database_config)

However, this may not be efficient/practical because every instance of the db object will create a session.

To improve upon this, you could use an approach using threading.local to keep connections local to threads.



class Model(object):
    """
    Base Model class, all other Models will inherit from this
    """
    _conn = threading.local()
    @property
    def db(self):
        if not hasattr(self._conn, 'db'):
            self._conn.db = pymysql.connect(**database_config)
        return self._conn.db

Note, a thread-local solution works assuming you're using a threading concurrency model. Note also that celery uses multiple processes (prefork) by default. This may or may not be a problem. If it is a problem, you may be able to work around it if you change the workers to use eventlet instead.

sytech
  • 29,298
  • 3
  • 45
  • 86
  • Oh I see ! I'll try this and let you know how it goes – juleslasne Nov 02 '20 at 09:46
  • I've awarded you the +rep bounty because it would've gone to waste, even though I haven't solved my problem yet. I'm still trying to figure out how to implement thread safe DB connection and all... – juleslasne Nov 02 '20 at 20:51
  • 1
    I think it would help alot if you moved [this line](https://github.com/Seluj78/PyMatcha/blob/feature/chatbot/backend/PyMatcha/utils/orm/_model.py#L19) inside of `__init__` and created a new connection every time `self.db = pymysql.connect(**database_config)` Or. You may also find it useful to take advantage of Pythons [thread-local data](https://docs.python.org/3/library/threading.html#thread-local-data) interfaces for optimization (only create a connection once for each thread, instead of every model object) – sytech Nov 02 '20 at 22:39
  • Thank you so much for the edit of your answer, the `db` property worked !! Now it's just a matter of optimizing the task that takes a while but other than that, it's flawless! Thank you so much ! I did have to use `--pool=threads` for it to work ! – juleslasne Nov 03 '20 at 12:17