I'm having trouble getting a templated SQL file to work in Composer. I think the problem is related to the fact that I'm packaging the DAG as a zip file in order to include additional code.
I started with this (just showing relevant parts):
dag = DAG('my_dag',
default_args=default_args,
schedule_interval=schedule_interval)
task0 = BigQueryOperator(
task_id='task0',
use_legacy_sql=False,
bql='sql/query_file.sql',
bigquery_conn_id=bigquery_conn_id,
dag=dag)
The file structure looks like this:
/dags/my_dag_file.py
/dags/sql/query_file.sql
/dags/my_pkg/
/dags/my_pkg/__init__.py
/dags/my_pkg/extra_module.py
I'm zipping it like this and copying it to the Composer dags folder:
zip -r my_zip_file.zip *.py my_pkg/ sql/
This worked locally but gave an error when deployed on Composer:
TemplateNotFound: sql/query_file.sql
I'm sure I included the SQL file in the zip. I also tried moving it to the root folder (no sql/ subdirectory) but I got the same result.
I read somewhere that you need to set template_searchpath
when instantiating the DAG object. I've not been able to do this successfully. When I try a relative path (sql
) I get more TemplateNotFound
errors. When I try an absolute path as below, I get not a directory
.
Here's what I tried:
dag = DAG('my_dag',
default_args=default_args,
schedule_interval=schedule_interval,
template_searchpath = os.path.dirname(__file__) + "/sql"
)
task0 = BigQueryOperator(
task_id='task0',
use_legacy_sql=False,
bql='query_file.sql',
bigquery_conn_id=bigquery_conn_id,
dag=dag)
I also tried making 'sql' part of the task's path instead of the template searchpath, and again I tried moving everything to the root level, and got the same 'not a directory' error.
As far as I can tell the problem has something to do with the fact that the files are contained within a zip. __file__
returns /home/airflow/gcs/dags/my_zip_file.zip/my_dag_file.py
. But then os.listdir(os.path.dirname(__file__))
throws the same not a directory
error. So maybe because we're executing inside a zip archive, we can't use folders and paths the same way. And maybe Jinja is getting tripped up on this...? Or else maybe there is something more to do when packaging up the zip file?
[2018-06-20 15:35:34,837] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last):
[2018-06-20 15:35:34,838] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/bin/airflow", line 27, in <module>
[2018-06-20 15:35:34,840] {base_task_runner.py:98} INFO - Subtask: args.func(args)
[2018-06-20 15:35:34,841] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run
[2018-06-20 15:35:34,841] {base_task_runner.py:98} INFO - Subtask: pool=args.pool,
[2018-06-20 15:35:34,842] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
[2018-06-20 15:35:34,843] {base_task_runner.py:98} INFO - Subtask: result = func(*args, **kwargs)
[2018-06-20 15:35:34,843] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1477, in _run_raw_task
[2018-06-20 15:35:34,844] {base_task_runner.py:98} INFO - Subtask: self.render_templates()
[2018-06-20 15:35:34,844] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1760, in render_templates
[2018-06-20 15:35:34,845] {base_task_runner.py:98} INFO - Subtask: rendered_content = rt(attr, content, jinja_context)
[2018-06-20 15:35:34,847] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 2481, in render_template
[2018-06-20 15:35:34,848] {base_task_runner.py:98} INFO - Subtask: return jinja_env.get_template(content).render(**context)
[2018-06-20 15:35:34,849] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line 812, in get_template
[2018-06-20 15:35:34,849] {base_task_runner.py:98} INFO - Subtask: return self._load_template(name, self.make_globals(globals))
[2018-06-20 15:35:34,850] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line 774, in _load_template
[2018-06-20 15:35:34,851] {base_task_runner.py:98} INFO - Subtask: cache_key = self.loader.get_source(self, name)[1]
[2018-06-20 15:35:34,852] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/jinja2/loaders.py", line 171, in get_source
[2018-06-20 15:35:34,854] {base_task_runner.py:98} INFO - Subtask: f = open_if_exists(filename)
[2018-06-20 15:35:34,855] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/jinja2/utils.py", line 151, in open_if_exists
[2018-06-20 15:35:34,856] {base_task_runner.py:98} INFO - Subtask: return open(filename, mode)
[2018-06-20 15:35:34,856] {base_task_runner.py:98} INFO - Subtask: IOError: [Errno 20] Not a directory: '/home/airflow/gcs/dags/my_zip_file.zip/sql/query_file.sql'