1

I use Airflow for various ETL work, but I've also started using UDFs heavily.

I'd like to organize my UDFs in a dataset my_project.my_udfs, and I was hoping to be able to utilize Airflow for this purpose. Is there a way to do so?

I ultimately want to be able to schedule queries like this, simple example:

CREATE FUNCTION `my_project.my_udfs.normalize`(s STRING)
  RETURNS STRING
  AS TRIM(LOWER(s));

A couple of answers to questions you may be thinking of:

  1. I'm part of a broader organization that uses Airflow, and the main benefit I want to leverage here is to have source control over these functions.
  2. The example is not such a case, but many of these functions are ones that will be updated periodically (monthly/quarterly).

Thanks in advance!

Paul Raff
  • 93
  • 5
  • Does this answer your question? [Airflow's bigqueryoperator not working with udf](https://stackoverflow.com/questions/50527704/airflows-bigqueryoperator-not-working-with-udf) – Jose Gutierrez Paliza Feb 25 '22 at 20:05

1 Answers1

0

If you're storing your UDF inside BigQuery, you can use a BigQuery hook and pass in some basic SQL to execute it.

BigQuery Hooks

from airflow.providers.google.cloud.hooks.bigquery import BigQueryHook

bq_hook = BigQueryHook(gcp_conn_id)
results = bq_hook.get_records('select * from my_table')

Replace my_table with your UDF, that should return a result set for you.

Alternatively if you don't have persistent UDFs or want to pass something in each time, you could store some SQL in an XML file that lives next to your Python code and grab it from there when you want to execute it.

Thom Bedford
  • 347
  • 1
  • 6