2

I am using airflow 2.0.2 to connect with databricks using the airflow-databricks-operator. The SQL Operator doesn't let me specify the database where the query should be executed, so I have to prefix the table_name with database_name. I tried reading through the doc of databricks-sql-connector as well here -- https://docs.databricks.com/dev-tools/python-sql-connector.html and still couldn't figure out if I could give the database name as a parameter in the connection string itself.

I tried setting database/schema/namespace in the **kwargs, but no luck. The query executor keeps saying that the table not found, because the query keeps getting executed in the default database.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Pbd
  • 1,219
  • 1
  • 15
  • 32

1 Answers1

2

Right now it's not supported - primarily reason is that if you have multiple statements then connector could reconnect between their execution, and result of use will be lost. databricks-sql-connector also doesn't allow setting of the default database.

Right now you can workaround that by adding explicit use <database> statement into a list of SQLs to execute (the sql parameter could be a list of strings, not only string).

P.S. I'll look, maybe I'll add setting of the default catalog/database in the next versions

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • wow. thanks. it is insane to have the ability. i am looking to move from snowflake and doing a POC with databricks while keeping the system completely compatible with existing snowflake. this is the only hiccup. – Pbd Apr 16 '22 at 19:08
  • One question - do you want to have these settings on the level of connection? Or in the operators? Can you drop me an email to alexott at gmail.com – Alex Ott Apr 17 '22 at 07:28
  • 1
    PR is submitted: https://github.com/apache/airflow/pull/22223 – Alex Ott Apr 24 '22 at 15:14