It was my understanding that references to streaming delta live tables require the use of the function STREAM()
, supplying the table name as an argument.
Given below is a code snippet that I found in one of the demo notebooks that Databricks provide. Here, I see the use of STREAM()
in the FROM
clause, but it has not been used in the LEFT JOIN
, even though that table is also a streaming table. This query still works.
What exactly is the correct syntax here?
CREATE OR REFRESH STREAMING LIVE TABLE sales_orders_cleaned(
CONSTRAINT valid_order_number EXPECT (order_number IS NOT NULL) ON VIOLATION DROP ROW
)
COMMENT "The cleaned sales orders with valid order_number(s) and partitioned by order_datetime."
AS
SELECT f.customer_id, f.customer_name, f.number_of_line_items,
timestamp(from_unixtime((cast(f.order_datetime as long)))) as order_datetime,
date(from_unixtime((cast(f.order_datetime as long)))) as order_date,
f.order_number, f.ordered_products, c.state, c.city, c.lon, c.lat, c.units_purchased, c.loyalty_segment
FROM STREAM(LIVE.sales_orders_raw) f
LEFT JOIN LIVE.customers c
ON c.customer_id = f.customer_id
AND c.customer_name = f.customer_name
Just for reference, given below are the other two tables that act as inputs to the above query,
CREATE OR REFRESH STREAMING LIVE TABLE sales_orders_raw
COMMENT "The raw sales orders, ingested from /databricks-datasets."
AS SELECT * FROM cloud_files("/databricks-datasets/retail-org/sales_orders/", "json", map("cloudFiles.inferColumnTypes", "true"))
CREATE OR REFRESH STREAMING LIVE TABLE customers
COMMENT "The customers buying finished products, ingested from /databricks-datasets."
AS SELECT * FROM cloud_files("/databricks-datasets/retail-org/customers/", "csv");