How to access gold table in delta lake for web dashboards and other?

Question

I am using the delta lake oss version 0.8.0.

Let's assume we calculated aggregated data and cubes using the raw data and saved the results in a gold table using delta lake.

My question is, is there a well known way to access these gold table data and deliver them to a web dashboard for example?

In my understanding, you need a running spark session to query a delta table.
So one possible solution could be to write a web api, which executes these spark queries.
Also you could write the gold results in a database like postgres to access it, but that seems just duplicating the data.

Is there a known best practice solution?

score 1 · Accepted Answer · answered Feb 25 '21 at 11:45

1

The real answer depends on your requirements regarding latency, number of requests per second, amount of data, deployment options (cloud/on-prem, where data located - HDFS/S3/...), etc. Possible approaches are:

Have the Spark running in the local mode inside your application - it may require a lot of memory, etc.
Run Thrift JDBC/ODBC server as a separate process, and access data via JDBC/ODBC
Read data directly using the Delta Standalone Reader library for JVM, or via delta-rs library that works with Rust/Python/Ruby

answered Feb 25 '21 at 11:45

Alex Ott

80,552
8
87
132

1

thanks for your input. I did have a look in exactly these 3 possibilities. But I will try as a first approach a REST Server like [livy](https://livy.apache.org/) or [spark jobserver](https://github.com/spark-jobserver/spark-jobserver) and call it via [fast api](https://fastapi.tiangolo.com/) – Chris Feb 27 '21 at 15:20
I didn’t have best experience with jobserver, so now I prefer JDBC ;-) – Alex Ott Feb 27 '21 at 20:51

How to access gold table in delta lake for web dashboards and other?

1 Answers1