0

Breaking my head for few days for simple task(Thought it simple...not anymore):

Main program sends hundreds of sql queries to fetch data from Multiple DBs . I thought Celery can be the right choice as it can scale and also simplify the threading/async orchestration .

The "clean" solution would be one generic class supposed to looks something like:

@app.task(bind=True , name='fetch_data')
def fetch_data(self,*args,**kwargs):
    db= kwargs['db']
    sql= kwargs['sql']
    session =  DBContext().get_session(db)
    result = session.query(sql).all()
     ...

But having trouble to implement such DBContext class which will instantiate once for each DB and reuse the DB sessions for request and once requests done - close it . (or any other recommendation you suggest ).

I was thinking about using a Base class to decorate the function and keep the all available connections there , But the problem such class can't init dynamically but once ... maybe there's way to make it work but not sure how ...

class DatBaseFactory(Task):

    def __call__(self, *args, **kwargs):
        
        print("In class",self.db)
        self.engine = DBContext.get_db(self.db)
        return super().__call__(*args, **kwargs)
     

@app.task(bind=True ,base=DatBaseFactory,  name='test_db', db=db ,engine='' )
def test_db(self,*args,**kwargs):
    print("From Task" ,self.engine)

Other alterative would be duplicating the functions as number of the DB and "preserved" them the sessions - but that's quite ugly solution .

Hope some1 can help here with this trouble ....

AviC
  • 354
  • 1
  • 5
  • 15
  • Each Celery task should establish connection to the DB it "deals with", process data, and close the connection before it returns. If this does not work for you then you should think of using a connection pool system like PgBouncer perhaps. – DejanLekic Feb 09 '23 at 12:30
  • Are u suggesting write separated task for each one of the DBs? Hopefully not as it will cause lot of code duplication – AviC Feb 09 '23 at 15:38

0 Answers0