I want to use data from the data that runs in my pipeline to generate a query and execute it on BigQuery.
Let's say I have something like this python SQL template:
template = '''
SELECT
email
FROM
`project_id.dataset_id.table_id`
WHERE
email = {runtime_email}
'''
I want to format this template in such a way that runtime_email
originated from the pipeline data (element).
E.G.
The pipeline reads from PubSub the variable runtime_email
with the email example@test.com
And I will execute something like:
with beam.Pipeline(options=options) as p:
bq_results = (p
| LoadDataFromPubSub()
| beam.io.Read(
beam.io.BigQuerySource(
query=template.format(element['runtime_email']),
use_standard_sql=True
)
)
)
Any ideas about how can I leverage the pipeline data to run the next step?