1

I'm trying to generate a UUID within a with clause on GBQ and then use it in a subsequent with clause for a join.

with q1 as (
    select generate_uuid() as uuid 
), q2 as (
    select uuid from q1
)
select * from q1 
union all 
select * from q2

This returns two distinct uuid.

enter image description here

How would I go about generating an ID that stays the same across with clauses?

altabq
  • 1,322
  • 1
  • 20
  • 33

2 Answers2

2

I'll start with the reason behind this discrepancy.

TLDR; as of today there is no option to force BigQuery to materialize CTE results. It'd be useful when a CTE referenced more than once in a statement.

see below query:

with cte_1 as (
  select count(1) as row_count
  from `bigquery-public-data.austin_311.311_service_requests` as sr
)
, cte_2 as (
  select row_count
  from cte_1
)
select row_count
from cte_1
union all
select row_count
from cte_2;

when the execution plan is examined, you'll see 2 Input stage for referenced sr table.

execution plan for above query

It'd be great if we have an option to materialize CTE results. As I remember, oracle has this implicitly if CTE used more than once or explicitly via hints.

materializing q1 explicitly to table then use it twice might be a workaround. I'd prefer temporary table. the drawback is; the cost may increase if your project uses on-demand pricing. (rather than flat-rate)

ktopcuoglu
  • 146
  • 1
  • 3
1

Because subquery always executes each time it is called, so with your query, the generate_uuid will be called twice for both q1 and q2 tables. I suggest you save the generated UUID to a table, then query from this to make sure the UUID is the same from both tables.

Ka Boom
  • 486
  • 1
  • 3
  • 10