5

I have written a module that dynamically adds periodic celery tasks based on a list of dictionaries in the projects settings (imported via django.conf.settings). I do that using a function add_tasks that schedules a function to be called with a specific uuid which is given in the settings:

def add_tasks(celery):
    for new_task in settings.NEW_TASKS:
        celery.add_periodic_task(
            new_task['interval'],
            my_task.s(new_task['uuid']),
            name='My Task %s' % new_task['uuid'],
        )

Like suggested here I use the on_after_configure.connect signal to call the function in my celery.py:

app = Celery('my_app')

@app.on_after_configure.connect
def setup_periodic_tasks(celery, **kwargs):
    from add_tasks_module import add_tasks
    add_tasks(celery)

This setup works fine for both celery beat and celery worker but breaks my setup where I use uwsgi to serve my django application. Uwsgi runs smoothly until the first time when the view code sends a task using celery's .delay() method. At that point it seems like celery is initialized in uwsgi but blocks forever in the above code. If I run this manually from the commandline and then interrupt when it blocks, I get the following (shortened) stack trace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
    return obj.__dict__[self.__name__]
KeyError: 'tasks'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
    return obj.__dict__[self.__name__]
KeyError: 'data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
    return obj.__dict__[self.__name__]
KeyError: 'tasks'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):

  (SHORTENED HERE. Just contained the trace from the console through my call to this function)

  File "/opt/my_app/add_tasks_module/__init__.py", line 42, in add_tasks
    my_task.s(new_task['uuid']),
  File "/usr/local/lib/python3.6/site-packages/celery/local.py", line 146, in __getattr__
    return getattr(self._get_current_object(), name)
  File "/usr/local/lib/python3.6/site-packages/celery/local.py", line 109, in _get_current_object
    return loc(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/app/__init__.py", line 72, in task_by_cons
    return app.tasks[
  File "/usr/local/lib/python3.6/site-packages/kombu/utils/objects.py", line 44, in __get__
    value = obj.__dict__[self.__name__] = self.__get(obj)
  File "/usr/local/lib/python3.6/site-packages/celery/app/base.py", line 1228, in tasks
    self.finalize(auto=True)
  File "/usr/local/lib/python3.6/site-packages/celery/app/base.py", line 507, in finalize
    with self._finalize_mutex:

It seems like there is a problem with acquiring a mutex.

Currently I am using a workaround to detect if sys.argv[0] contains uwsgi and then not add the periodic tasks, as only beat needs the tasks, but I would like to understand what is going wrong here to solve the problem more permanently.

Could this problem have something to do with using uwsgi multi-threaded or multi-processed where one thread/process holds the mutex the other needs?

I'd appreciate any hints that can help me solve the problem. Thank you.

I am using: Django 1.11.7 and Celery 4.1.0

Edit 1

I have created a minimal setup for this problem:

celery.py:

import os
from celery import Celery
from django.conf import settings
from myapp.tasks import my_task

# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_app.settings')

app = Celery('my_app')

@app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
    sender.add_periodic_task(
        60,
        my_task.s(),
        name='Testtask'
    )

app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

tasks.py:

from celery import shared_task
@shared_task()
def my_task():
    print('ran')

Make sure that CELERY_TASK_ALWAYS_EAGER=False and that you have a working message queue.

Run:

./manage.py shell -c 'from myapp.tasks import my_task; my_task.delay()'

Wait about 10 seconds before interrupting to see the above error.

Tim
  • 1,272
  • 11
  • 28
  • 1
    Why would you initialize that in a Django process instead of having a dedicated process to do that? – Tarun Lalwani Mar 05 '18 at 12:08
  • I do have a dedicated `celery beat` process, but I need to call `.delay()` from within the django process. That is where the code blocks. – Tim Mar 05 '18 at 14:17
  • So the add_tasks doesn't get called from the Django process? – Tarun Lalwani Mar 05 '18 at 14:30
  • It is called in the `@app.on_after_configure.connect` signal. And that seems to get fired in each process, so in the Django process as well. – Tim Mar 05 '18 at 15:32
  • 1
    I would suggest you shouldn't run this from the Django code. In case you still want to do the initialization like the this, then you should have another celery task and call that using `.delay` from the Django code – Tarun Lalwani Mar 05 '18 at 16:16
  • The `@app.on_after_configure.connect` signal is fired automatically by celery. I don't have any control over it, so I can't just not run it. Likewise I have not yet found a way to figure out wether I am in a `beat`, a `worker` or a `django` Thread. I could probably create a second `celery.py` just for the `beat` thread, but that seems to me like a rather ugly workaround. – Tim Mar 05 '18 at 22:08
  • I am not able to reproduce the problem. Can you share your uwsgi config that you are using to start django app? – Chillar Anand Mar 06 '18 at 07:43
  • @Tim, You can easily check that using `running_in_uwsgi = 'uwsgi' in sys.modules`. If that is the case just exit from `add_tasks` without doing anything – Tarun Lalwani Mar 06 '18 at 09:50
  • @ChillarAnand I think the setup also works with running the `manage.py runserver` and also directly in a shell. I'll create a minimal setup where I have the error and add it to the question. – Tim Mar 07 '18 at 10:18
  • @TarunLalwani Thank you. This is actually a nicer way of detecting if I am running in uwsgi. Unfortunately I have now also noticed the problem exists when running tasks directly from the shell, so I would need to have multiple exceptions checking for different scenarios. Maybe there is a similar way of detecting if we are running in the `celery beat` thread? – Tim Mar 07 '18 at 10:21
  • @Tim, may be this can help in that https://stackoverflow.com/questions/39003282/how-can-i-detect-whether-im-running-in-a-celery-worker ? – Tarun Lalwani Mar 07 '18 at 13:41

2 Answers2

1

So, I have found out that the @shared_task decorator creates the problem. I can circumvent the problem when I declare the task right in the function called by the signal like so:

def add_tasks(celery):
    @celery.task
    def my_task(uuid):
        print(uuid)

    for new_task in settings.NEW_TASKS:
        celery.add_periodic_task(
            new_task['interval'],
            my_task.s(new_task['uuid']),
            name='My Task %s' % new_task['uuid'],
        )

This solution is actually working for me, but I have one more problem with this: I use this code in a pluggable app, so I can't directly access the celery app outside of the signal handler but would like to also be able to call the my_task function from within other code. By defining it within the function it is not available outside of the function, so I cannot import it anywhere else.

I can probably work around this by defining the task function outside of the signal function, and use it with different decorators here and in the tasks.py. I am wondering though if there is a decorator apart from the @shared_task decorator that I can use in the tasks.py that does not create the problem.

The current best solution could be:

task_app.__init__.py:

def my_task(uuid):
    # do stuff
    print(uuid)

def add_tasks(celery):
    celery_my_task = celery.task(my_task)
    for new_task in settings.NEW_TASKS:
        celery.add_periodic_task(
            new_task['interval'],
            celery_my_task(new_task['uuid']),
            name='My Task %s' % new_task['uuid'],
        )

task_app.tasks.py:

from celery import shared_task
from task_app import my_task
shared_my_task = shared_task(my_task)

myapp.celery.py:

import os
from celery import Celery
from django.conf import settings


# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_app.settings')

app = Celery('my_app')

@app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
    from task_app import add_tasks
    add_tasks(sender)


app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
Tim
  • 1,272
  • 11
  • 28
0

Could you give a try that signal @app.on_after_finalize.connect:

some fast snippet from working project celery==4.1.0, Django==2.0, django-celery-beat==1.1.0 and django-celery-results==1.0.1

@app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
    """ setup of periodic task :py:func:shopify_data_fetcher.celery.fetch_shopify
    based on the schedule defined in: settings.CELERY_BEAT_SCHEDULE
    """
    for task_name, task_config in settings.CELERY_BEAT_SCHEDULE.items():
        sender.add_periodic_task(
            task_config['schedule'],
            fetch_shopify.s(**task_config['kwargs']['resource_name']),
            name=task_name
        )

piece of CELERY_BEAT_SCHEDULE:

CELERY_BEAT_SCHEDULE = {
    'fetch_shopify_orders': {
        'task': 'shopify.tasks.fetch_shopify',
        'schedule': crontab(hour="*/3", minute=0),
        'kwargs': {
            'resource_name': shopify_constants.SHOPIFY_API_RESOURCES_ORDERS
        }
    }
}
andilabs
  • 22,159
  • 14
  • 114
  • 151
  • Thank you for your answer. I tried this out and it did not help, but it pointed me into the right direction: It seems like the usage of the @shared_task decorator is the problem. I'll edit my question. – Tim Mar 10 '18 at 20:09
  • What decorator do you use to define your `fetch_shopify` celery task? – Tim Mar 10 '18 at 20:51
  • 1
    As there is only one hour left, I am going to accept my answer but award the bounty to you, as you pointed me in the right direction. – Tim Mar 12 '18 at 10:02