1

I want to be able to delete tasks in the flower successful/failed tasks...

so my idea is to have a scheduled celery beat to delete tasks older than X amount of hours.

Anyone knows how to achieve this, where are the tasks stored... etc ?

Goal: Set a configuration variable that says number of hours let's say 48 hours of keeping the logs, then autodelete

That serves people mostly in europe with GDPR compliance, and also protects customers' privacy

Omar
  • 8,374
  • 8
  • 39
  • 50
  • Did you think about contributing that to the Flower itself? It could be some Flower configuration parameter that enables this behaviour... – DejanLekic Feb 03 '22 at 15:31
  • @DejanLekic I would love to contribute it, yes. But I need to understand how it stores its information, currently I'm on a docker setup – Omar Feb 04 '22 at 06:53
  • We have some GDPR rules in place not to store such information, so I guess many can benefit from this feature – Omar Feb 04 '22 at 06:54
  • A good start is to analyse Flower's state file, and see what is stored there. – DejanLekic Feb 04 '22 at 13:30
  • @DejanLekic what is flower state file? where is it located ? – Omar Feb 04 '22 at 13:47
  • 1
    @OmarS. I believe this question can help: https://stackoverflow.com/questions/59319550/i-want-to-delete-all-flower-celery-history-logs-but-it-does-not-work – Paulo Feb 13 '22 at 12:29

2 Answers2

3

I implemented new APIs for deleting tasks

Events:
  None delete_tasks_by_time(int to_timestamp)

Example: delete tasks older than 48 hours

import celery
import os
import timedelta
import datetime

broker = os.environ.get('CELERY_BROKER_URL')
app = celery.Celery('tasks', broker=broker)
flower = Flower(capp=app, options=flower_options)

time_delta = timedelta(hours=48)
now = datetime.datetime.now()
delete_before_time = now-time_delta

flower.events.delete_tasks_by_time(delete_before_time.timestamp())

Then we can a celery beat scheduler, that runs each hour, and delete the tasks

@app.task(queue='cleanup_tasks')
def clean_up_tasks():
    from flower.app import Flower

    time_delta = timedelta(hours=48)
    now = datetime.datetime.now()
    delete_before_time = now-time_delta
    flower_options = object()
    flower_options.db = 'flower'
    flower_options.persistent = True
    flower_options.purge_offline_workers = 1
    # note: use env vars better
    flower = Flower(capp=app, options=flower_options)
    flower.events.delete_tasks_by_time(delete_before_time.timestamp())

@app.on_after_configure.connect
def add_periodic(**kwargs):
    app.add_periodic_task(crontab(hour="*", minute=0), clean_up_tasks.s(), name='cleanup-tasks')

This is especially helpful for people who wants to maintain a good GDPR compliance, as you keep only data for a short time for basically debugging, everything is configurable to taste

References: https://github.com/mher/flower/issues/1189 https://github.com/mher/flower/pull/1188

Hopefully the PR gets merged soon!

Enjoy <3

Omar
  • 8,374
  • 8
  • 39
  • 50
0

Used celery beat with flower...

app.config_from_object("django.conf:settings", namespace="CELERY")
app.conf.beat_scheduler = "django_celery_beat.schedulers.DatabaseScheduler"
# app.conf.task_default_queue = 'default'

app.conf.tasks_queues = (
    Queue("default", exchange="default", routing_key="default"),
    Queue("data_queue", exchange="data_queue", routing_key="data"),
    # Queue("cleanup_queue", exchange="cleanup_queue", routing_key="cleanup"), #you don't necessarily need this queue
)

# Load task modules from all registered Django apps.
app.autodiscover_tasks()

app.conf.beat_schedule = {
    "data": {
        "task": "app.name.execute_data_tasks",
        "schedule": crontab(minute="*/3"),
    },
    "cleanup": {
        "task": "app.name.execute_cleanup_tasks",
        "schedule": crontab(minute="*/60"), #you can play around with the config here
    },
}
Dharman
  • 30,962
  • 25
  • 85
  • 135
Yusuf Ganiyu
  • 842
  • 9
  • 8