I want to be join in two silver tables LIVE tables that are being streamed to create a gold table, however, I have run across multiple errors including "RuntimeError("Query function must return either a Spark or Koalas DataFrame") RuntimeError: Query function must return either a Spark or Koalas DataFrame" Not sure where I'm going wrong but if anybody has a solution to the problem, that would be much appreciated!
Asked
Active
Viewed 2,196 times
2 Answers
2
You can join the tables as if they were dataframes and return a new one :
import dlt
from pyspark.sql.functions import *
from pyspark.sql.types import *
# First Silver table definition
@dlt.table(
comment="Silver Table One"
)
def silver_table_one():
return (spark.read.format("json").load(data_path_data_one))
# Second Silver table definition
@dlt.table(
comment="Wikipedia clickstream data cleaned and prepared for analysis."
)
def silver_table_two():
return (spark.read.csv.load(data_path_data_two))
# Joining the two Silver Tables by calling them by the "function" name
@dlt.table(
comment="Joining Silver Tables"
)
def my_gold_table():
silver_one = dlt.read("silver_table_one")
silver_two = dlt.read("silver_table_two")
return (
silver_one.join(silver_two, silver_one.id == silver_two.id, how="inner")
)

Axel R.
- 1,141
- 7
- 22
0
Referring to the answer given by @Axel R. your could also write the join as follows:
@dlt.table(
comment="Joining Silver Tables"
)
def my_gold_table():
silver_one = dlt.read("silver_table_one")
silver_two = dlt.read("silver_table_two")
return (
silver_one.join(silver_two, ["id"], how="inner")
)
See here.

Hackerman443
- 11
- 1