4

I'm using Superset to build realtime dashboards and I want datasources (built from sqlLab resultset) be updated and cached in background on a specific day and time.

I connected superset to MySql and built some queries in SQLLab. The result of these queries is used as datasources to build charts and dashboards. The problem is that sometimes these queries take too long and chart/dashboards are not displayed correctly (e.g Query timeout - Warning Message is showed). I configured a cache time and celery async. worker, but when this cache expires dashboards are refreshed and queries related to charts/datasources are executed showing the timeout (or other error) message.

Actual results:

  • Chart/dashboards show timeout warning message when refresh.

Expected results:

  • Charts/dashboards always read from cache to avoid run long queries.
  • Schedule background async. datasource refresh on a specific day and time.
TylerH
  • 20,799
  • 66
  • 75
  • 101
lmdza
  • 41
  • 1
  • 4
  • Did you ever found a way to tag your dashboards? I'm also looking for the Tag Warm Up Cache Strategy but I can't seem to find a way to tag my dashboards. – Nicolas Aug 06 '21 at 16:23

2 Answers2

2

Superset can execute a query (aka "datasource" in superset) and cache its results for a period of time (the Chart Cache Timeout parameter on datasources). You'd use this caching to avoid having to re-query the database when the data is "fresh enough" - e.g. a daily sales chart can be cached for 24 hours.
Conceptually, Superset is a data visualization tool, but not a data preparation/transformation tool, and as such, afaik, it has no concept of scheduling background jobs to do automatic periodic refreshes on specific days/times.

What you'd want to do instead is to create materialized views in your database, and use these views/tables as datasources for your superset queries and charts.
You will need to schedule the refreshing of those material views with a different tool - which could be as simple as a cron job.
You can then use these materialized views as datasources for your charts. The data in these views being effectively static, there is no need to run your superset queries asynchronously or to cache the results in superset.

David Tobiano
  • 1,188
  • 8
  • 10
  • David, thank for your response. I found that we can schedule cache refresh using different strategies (with celery beats). One of these strategies is Tag based (refresh cache of specific Tagged objects). I think it can be used to schedule datasource refresh, but I still dont know how to Tag objects in Superset. – lmdza Sep 12 '19 at 14:25
  • The only thing I see related to Celery beats is to allow email report tasks (but I'm not seeing other pre-defined task types, such as query refresh) - see https://superset.incubator.apache.org/installation.html#email-reports. I also see that you can expose queries you want to have refreshed at `/savedqueryviewapi/api/read` - see https://superset.incubator.apache.org/installation.html#sql-lab – David Tobiano Sep 13 '19 at 16:26
  • Here you can see the cache warm up strategies https://github.com/apache/incubator-superset/blob/master/superset/tasks/cache.py. I want to use DashboardTagsStrategy but I don´t know how can I tag charts or dashboards. – lmdza Sep 24 '19 at 18:49
  • I see. Here's the issue and PRs for dashboard tags: https://github.com/apache/incubator-superset/issues/3821 https://github.com/janetye/incubator-superset/pull/1 https://github.com/janetye/incubator-superset/pull/2 I need to play around and try it... – David Tobiano Sep 24 '19 at 22:28
0

Solution moved from the question to an answer:

I found that superset has a Celery task that will periodically warm up the cache based on different strategies. One of these strategies (DashboardTagsStrategy) warm up cache of charts/dashboards with custom tags. I understand that if we add a CELERYBEAT_SCHEDULE section in config.py

specifying a crontab and tagging specific charts, we can schedule background async datasource refresh on a specific day and time. Steps to follow would be:

  • Execute a query in SQL Lab.
  • Use the result of this query as a datasource.
  • Build a chart based on this datasource.
  • Tag this chart (I still don´t know how to Tag charts or dashboards)
  • Add a warm up cache strategy in config.py based on Tags.
  • Configure cache timeout in charts (or datasources?)
TylerH
  • 20,799
  • 66
  • 75
  • 101