1

I want to be join in two silver tables LIVE tables that are being streamed to create a gold table, however, I have run across multiple errors including "RuntimeError("Query function must return either a Spark or Koalas DataFrame") RuntimeError: Query function must return either a Spark or Koalas DataFrame" Not sure where I'm going wrong but if anybody has a solution to the problem, that would be much appreciated!

2 Answers2

2

You can join the tables as if they were dataframes and return a new one :

import dlt
from pyspark.sql.functions import *
from pyspark.sql.types import *


# First Silver table definition
@dlt.table(
  comment="Silver Table One"
)
def silver_table_one():
  return (spark.read.format("json").load(data_path_data_one))

# Second Silver table definition
@dlt.table(
  comment="Wikipedia clickstream data cleaned and prepared for analysis."
) 
def silver_table_two():
  return (spark.read.csv.load(data_path_data_two))


# Joining the two Silver Tables by calling them by the "function" name
@dlt.table(
  comment="Joining Silver Tables"
)
def my_gold_table():
  silver_one = dlt.read("silver_table_one")
  silver_two = dlt.read("silver_table_two")
  return ( 
     silver_one.join(silver_two, silver_one.id == silver_two.id, how="inner")
  )
Axel R.
  • 1,141
  • 7
  • 22
0

Referring to the answer given by @Axel R. your could also write the join as follows:

@dlt.table(
  comment="Joining Silver Tables"
)
def my_gold_table():
  silver_one = dlt.read("silver_table_one")
  silver_two = dlt.read("silver_table_two")
  return ( 
     silver_one.join(silver_two, ["id"], how="inner")
  )

See here.