0

Our team is currently migrating dags to Airflow 2. One of the changes we're applying is moving from the deprecated BQ ops to the new ones, for example, BigQueryInsertJobOperator

The problem is, with these new operators, the query is no longer passed directly to the operator, but one must pass a configuration object (including the query as a property) in JSON format.

Of course, Jinja2 allows for the parsing of a file containing the query

select_query_job = BigQueryInsertJobOperator(
    task_id="select_query_job",
    configuration={
        "query": {
            "query": "{% include 'example_bigquery_query.sql' %}",
            "useLegacySql": False,
        }
    },
    location=location,
)

but when looking at the rendered template of the task, instead of getting a nicely formatted query like

SELECT
   foo
FROM 
   bar
WHERE 
   x=y

it just ends up looking like

"query": {
            "query": "SELECT\n   foo\n   FROM\n   bar\n   WHERE\n   x=y\n"
            "useLegacySql": False,
        }

It may not look so problematic with a tiny query, but we have queries in the 1000s of lines and as you can imagine, this makes debugging a really tedious task, not just because of the newlines, but because it also goes for horizontal scrolling.

Is there another way to read the query or to tell airflow to render it in a nicer format? As it is now, rolling back to the deprecated ops would be better than dealing with this format.

Sakshi Gatyan
  • 1,903
  • 7
  • 13
Dasph
  • 420
  • 2
  • 15
  • did you ever find a solution to this? its painful. – Dominik Jun 01 '22 at 20:38
  • I heard of one but I never applied it. You can create a custom class which extends the operator and there is a particular property which controls what goes in the rendered template, so you can load the SQL like in the old operators and get them rendered, then also include them in the new operator. – Dasph Jun 02 '22 at 08:21
  • do you have any links you can share as examples for these options? – Dominik Jun 06 '22 at 16:51

0 Answers0