8

I have a number of Celery tasks for which they are long-running processes. As such, I'd like to implement a custom state in order to query their progress.

According to the documentation, it is easy enough to implement a custom state for the given task.

def download_count(wget_base_path):
    # recursively traverse root folder and return count of files
    return sum([len(files) for r, d, files in os.walk(wget_base_path)])

@app.task(bind = True)
def html_download(self, url='', cl_id=-1):

    log = get_logger(__name__)
    ...
    # wget download location
    wget_base_path = settings.WGET_PATH + str(cl_id) 

    os.system(wget_cmd)

    if not self.request.called_directly:
        log.debug('State progress called')
        self.update_state(state = 'PROGRESS', meta = {'item_count' : download_count(wget_base_path)})

Now, when I call this via

from app.ingest.tasks import html

ingest = html.html_download.delay(url, 54431)

the job kicks off as expected. But any time I try to get the updated state, I don't get any of the metadata.

For example,

In [6]: ingest.state
Out[6]: 'PENDING'


In [10]: ingest._get_task_meta()
Out[10]: {'result': None, 'status': 'PENDING'}

Can it be that the os.system call for the wget command is blocking everything? If I use subprocess, the task finishes very quickly while the chold process executes.

Jason
  • 11,263
  • 21
  • 87
  • 181

1 Answers1

1

Celery stores the result states in the backend. By default if a backend is configured and celery can't find any details concerning the task_id provided as input, PENDING state is sent as the response. I can think of two possibilities in the current case:

  1. The task may actually be in the processing and haven't reached the update part yet.
  2. Celery has task_result_expires configuration value which deletes the task metadata after that time interval. So any query made after that results in a PENDING state.
Lemon Reddy
  • 553
  • 3
  • 5