1

I am using the delta lake oss version 0.8.0.

Let's assume we calculated aggregated data and cubes using the raw data and saved the results in a gold table using delta lake.

My question is, is there a well known way to access these gold table data and deliver them to a web dashboard for example?

In my understanding, you need a running spark session to query a delta table.
So one possible solution could be to write a web api, which executes these spark queries.
Also you could write the gold results in a database like postgres to access it, but that seems just duplicating the data.

Is there a known best practice solution?

Chris
  • 35
  • 6

1 Answers1

1

The real answer depends on your requirements regarding latency, number of requests per second, amount of data, deployment options (cloud/on-prem, where data located - HDFS/S3/...), etc. Possible approaches are:

  1. Have the Spark running in the local mode inside your application - it may require a lot of memory, etc.
  2. Run Thrift JDBC/ODBC server as a separate process, and access data via JDBC/ODBC
  3. Read data directly using the Delta Standalone Reader library for JVM, or via delta-rs library that works with Rust/Python/Ruby
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • 1
    thanks for your input. I did have a look in exactly these 3 possibilities. But I will try as a first approach a REST Server like [livy](https://livy.apache.org/) or [spark jobserver](https://github.com/spark-jobserver/spark-jobserver) and call it via [fast api](https://fastapi.tiangolo.com/) – Chris Feb 27 '21 at 15:20
  • I didn’t have best experience with jobserver, so now I prefer JDBC ;-) – Alex Ott Feb 27 '21 at 20:51