I have a number of Celery tasks for which they are long-running processes. As such, I'd like to implement a custom state in order to query their progress.
According to the documentation, it is easy enough to implement a custom state for the given task.
def download_count(wget_base_path):
# recursively traverse root folder and return count of files
return sum([len(files) for r, d, files in os.walk(wget_base_path)])
@app.task(bind = True)
def html_download(self, url='', cl_id=-1):
log = get_logger(__name__)
...
# wget download location
wget_base_path = settings.WGET_PATH + str(cl_id)
os.system(wget_cmd)
if not self.request.called_directly:
log.debug('State progress called')
self.update_state(state = 'PROGRESS', meta = {'item_count' : download_count(wget_base_path)})
Now, when I call this via
from app.ingest.tasks import html
ingest = html.html_download.delay(url, 54431)
the job kicks off as expected. But any time I try to get the updated state, I don't get any of the metadata.
For example,
In [6]: ingest.state
Out[6]: 'PENDING'
In [10]: ingest._get_task_meta()
Out[10]: {'result': None, 'status': 'PENDING'}
Can it be that the os.system call for the wget command is blocking everything? If I use subprocess
, the task finishes very quickly while the chold process executes.