4

For Data Visualisation purposes, I am using Databricks to create dashboards. This is achieved by creating charts in a notebook and linking those charts to the dashboard.

But I'm facing an issue in refreshing the data linked to the dashboard. I tried creating a job to refresh the entire notebook. The job succeeded but the dashboard linked to it was still showing previous data.

I believe Databricks Jobs creates a separate instance, runs the workflows and discards the instance.

With this, I won’t be able to get the updated data in the dashboard. I have to manually click on the Run All button.

Is there any other way this could be done automatically?

Sanjay
  • 328
  • 1
  • 3
  • 13
  • As this post has been within the last months of writing my comment, here my observations: I am working also with Databricks and I am using Google Chrome, since a while I experience also issues with autorefresh. Commands are shown as in progress, in reality they are done. I assume this is a glitch in the front end and this will be fixed at some point – Stefan Papp May 13 '22 at 05:45
  • I also am not able to get my dashboards to auto-refresh either. And I am fairly certain this isn't a browser issue. – Rebekah Waterbury Feb 02 '23 at 20:15
  • Same here, the updated notebook results are only accessible via the job run results, **not** via the dashboard view. – py_j Feb 08 '23 at 08:12
  • I am having the same issue. Any update on this topic? – Liky Mar 29 '23 at 08:40

2 Answers2

2

I believe Databricks Jobs creates a separate instance, runs the workflows and discards the instance.

Unfortunately, this assumption appears to be right: the Databricks support supplied me with the following explanation (Feb 2023):

For now, sharing a dashboard that can be regularly updated can be done only through its job run result. [...] Now, the challenge you have is to share a version of the notebook without any code and with interactive visualizations.

If that is the case, you can create a dashboard based on the visualisations in the notebook, and in the job result page, in the output toggle, select [...] dashboard or results only options. You will get a link to that specific dashboard like /run/latestSuccess/dashboard/{dashboard_id}.

For my use case (sharing updated dashboards with non-coder audience), I've setup scheduled runs of the notebooks, added the users to the job run notifications (only for successful runs) and supplied them with an illustrated guide how to access the dashboard view from that email (subject like [long number] Success--view run [number] of [name of notebook]:

  1. click link "View run in Databricks"
  2. In "Output" dropdown menu select Dashboard: <name of your dashboard>
  3. Click the right arrow in the sidebar "Task run details" to hide the sidebar

If you find a way to setup a dashboard with a single URL that is updated periodically, please let me know!

py_j
  • 381
  • 3
  • 7
0

First of all apologies for the late reply.

My use case here was to embed the databricks dashboard in a web application. So I had created the visualizations in a notebook and had scheduled it. The issue I was seeing was the notebook was not updating with the new data even I had scheduled it to run daily. But that is how Databricks works. Everytime it creates a new cluster, runs the job and discards it. We can see the output run in the jobs section.

So in order to embed the visualization in website, we have to take REST API route. We tried to export the dashboard as html and use the downloaded html file in an iframe, it worked but the data would be static. We tried to export the dashboard as html and use the downloaded html file in an iframe, it worked but the data would be static.

So I came up with REST API approach.

  • First we need to create a job which runs the notebook either manually or on a schedule
  • Once created, run the job once
  • You can see all the Job runs in Jobs → Job Id → Runs
  • For embedding DBC in web app, we need this Job Id of the notebook. From this, we can get the latest successful run_id of the job
  • Once we have the run_id, we can export the dashboard linked to it as html and use it in rendering the view
  • Before we go to the code changes, we need to create an Access token which is needed in API
  • Go to Settings → User Settings → Access Tokens → Generate new Token → Store the token in a secured place
  • Code to get last successful run id with a given job id:

private static Int32 GetRunId(string bearerToken, Int32 job_id)
        {
            string responseString1 = string.Empty;
            string url1 = @"https://host-name/api/2.1/jobs/runs/list?job_id=" + job_id.ToString();
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url1);
            request.Headers.Add("Authorization", "Bearer " + bearerToken);
            request.AutomaticDecompression = DecompressionMethods.GZip;
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            using (Stream stream = response.GetResponseStream())
            using (StreamReader reader = new StreamReader(stream))
            {
                responseString1 = reader.ReadToEnd();
            }
            JObject json1 = JObject.Parse(responseString1);
            Int32[] run_id_array = new Int32[] { };
            List<Int32> run_id_list = new List<Int32>();
            foreach (var item in json1["runs"])
            {
                if ((item["state"]["result_state"]).ToString() == "SUCCESS")
                {
                    run_id_list.Add((Int32)(item["run_id"]));
                }
            }
            if (run_id_list.Count == 0)
            {
                throw new InvalidOperationException("Empty list");
            }
            Int32 max_run_id = Int32.MinValue;
            foreach (Int32 item in run_id_list)
            {
                Int32 value = item;
                if (value > max_run_id)
                {
                    max_run_id = value;
                }
            }
            Int32 run_id = max_run_id;
            return run_id;
        }
  • Using the run id, get the report export as html in string format

private static string GetReportExport(string bearerToken, Int32 run_id)
        {
            string responseString2 = string.Empty;
            string url2 = @"https://host-name/api/2.0/jobs/runs/export?run_id=" + run_id.ToString() + "&views_to_export=DASHBOARDS";
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url2);
            request.Headers.Add("Authorization", "Bearer " + bearerToken);
            request.AutomaticDecompression = DecompressionMethods.GZip;
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            using (Stream stream = response.GetResponseStream())
            using (StreamReader reader = new StreamReader(stream))
            {
                responseString2 = reader.ReadToEnd();
            }
            JObject json = JObject.Parse(responseString2);
            string content = (json["views"][0]["content"]).ToString();
            return content;
        }
  • Create a new function Databricks, where bearer token and job id is stored and call the above 2 functions. The second function returns the dashboard exported as html in string format. This is stored in a ViewBag and used in View to render the dashboard

public IActionResult Databricks()
        {
            string bearerToken = "<Bearer_Token>";
            Int32 job_id = xxx;
            Int32 run_id = GetRunId(bearerToken, job_id);
            ViewBag.databricks_url1 = GetReportExport(bearerToken, run_id);
            return View();
        }
  • Create a new View and add below code

@{ ViewData["Title"] = "Databricks";
    Layout = "~/Views/Shared/_Layout_ws1.cshtml"; }
<h1>@ViewData["Title"]</h1>
<br />
<h3>Databricks Test Dashboard 1</h3>
<br />
<p>Embedding dbc html string in an iframe</p>
<br />
<iframe srcdoc="@ViewBag.databricks_url1" width="1000" height="750" frameborder="0"></iframe>
  • This should load the dashboard in the website
Sanjay
  • 328
  • 1
  • 3
  • 13