11

When any celery task is enqueued I want to add contextual metadata the worker will be able to use.

The following code example works but I would like to have an appropriate celery-style solution.

from celery.signals import before_task_publish, task_prerun

@before_task_publish.connect
def receiver_before_task_publish(sender=None, headers=None, body=None, **kwargs):
    task_kwags = body[1]
    metadata = {"foo": "bar"}
    task_kwags['__metadata__'] = metadata

@task_prerun.connect
def receiver_task_pre_run(task_id, task, *args, **kwargs):
    metadata = kwargs['kwargs'].pop('__metadata__', {})
    # metadata == {"foo": "bar"}
jrobichaud
  • 1,272
  • 1
  • 11
  • 23
  • Where do the metadata come from? Do you know them (or can infer them) upfront at the time of task definition or only at the time the task is being enqueued? – Tomáš Linhart Apr 13 '19 at 06:56
  • Only when enqueued. The data comes from the calling django request. – jrobichaud Apr 13 '19 at 10:44
  • And isn't it possible to modify the task to take the metadata as arguments? – Tomáš Linhart Apr 13 '19 at 14:21
  • I do not want the tasks to be modified. My goal is to implement a plugin for structured logging that just works after only minimal configuration. My plugin must pass context from django’s request to the task’s logger without any code change. I have a working proof of concept but the implementation seems too hackish to me. I hope celery have a mechanism for this purpose. – jrobichaud Apr 13 '19 at 15:07

1 Answers1

19

When a task starts in the worker the content of before_task_publish's header is in the **kwargs of push_request.

celery/app/tasks.py:1000

    def push_request(self, *args, **kwargs):
        self.request_stack.push(Context(*args, **kwargs))

Something nice is done in the constructor of Context. self.__dict__.update() means we can access the values as Context(metadata={'foo': 'bar'}).metadata

celery/app/tasks.py:99

class Context(object)
# ...
    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def update(self, *args, **kwargs):
        return self.__dict__.update(*args, **kwargs)

The task context is accessible from Task's request property.

celery/app/tasks.py:1019

class Task(object):
# ...
    def _get_request(self):
        """Get current request object."""
        req = self.request_stack.top
        if req is None:
            # task was not called, but some may still expect a request
            # to be there, perhaps that should be deprecated.
            if self._default_request is None:
                self._default_request = Context()
            return self._default_request
        return req
    request = property(_get_request)

Which means the final solution is simply this:

from celery.signals import before_task_publish, task_prerun

@before_task_publish.connect
def receiver_before_task_publish(sender=None, headers=None, body=None, **kwargs):
    metadata = {"foo": "bar"}
    headers['__metadata__'] = metadata

@task_prerun.connect
def receiver_task_pre_run(task_id, task, *args, **kwargs):
    metadata = getattr(task.request, '__metadata__', {}) 
    # metadata == {"foo": "bar"}

Note: task.request.__metadata__ would also work but it fails if a task was enqueued before the signals are integrated. Safer this way.

jrobichaud
  • 1,272
  • 1
  • 11
  • 23