3

I have a heavy external library class which takes time to initialize and consumes a lot of memory. I want to create it once per task instance, at minimum.

class NlpTask(Task):
    def __init__(self):
        print('initializing NLP parser')
        self._parser = nlplib.Parser()
        print('done initializing NLP parser')

    @property
    def parser(self):
        return self._parser

@celery.task(base=NlpTask)
def my_task(arg):
    x = my_task.parser.process(arg)
    # etc.

Celery starts 32 worker processes, so I'd expect the printing "initializing ... done" 32 times, as I assume that a task instance is created per each worker. Surprisingly, I'm getting the printing once. What actually happens there? Thanks.

davka
  • 13,974
  • 11
  • 61
  • 86

2 Answers2

3

Your NlpTask is initializing once when it is getting registered with the worker.

If you have two tasks like

@celery.task(base=NlpTask)
def foo(arg):
    pass


@celery.task(base=NlpTask)
def bar(arg):
    pass

Then when you start a worker, you will see 2 initializations.

If you want to initialize it once for every worker, you can use worker_process_init signal.

from celery.signals import worker_process_init


@worker_process_init.connect()
def setup(**kwargs):
    print('initializing NLP parser')
    # setup
    print('done initializing NLP parser')

Now, when you start a worker, you will see setup is being called by each process once.

Chillar Anand
  • 27,936
  • 9
  • 119
  • 136
1

for this:

that's my point - I'd expect once per worker, and it seems like once per celery instance. I edited the question – @davka

the answer must be use a sender filter in connect, like:

@worker_process_init.connect(sender='xx')
def func(sender, **kwargs):
    if sender == 'xx':
        # dosomething

but I found that it's not working in celery 4.0.2.

Ruli
  • 2,592
  • 12
  • 30
  • 40
tomy0608
  • 317
  • 1
  • 9