0

I am running Airflow v2.2.2 using the MWAA service on AWS

I have the following DAG

import airflow
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
import time
from datetime import timedelta

import snowflake.connector
from airflow.hooks.base_hook import BaseHook

datestr = time.strftime("%Y%m%d")

connection = BaseHook.get_connection('test_snowflake')
snow_user = connection.login
snow_pass = connection.password
snow_host = connection.host
snow_schema = connection.schema

default_args = {
    "owner": "test",
    "depends_on_past": False,
    "email": ["info@test.com"],
    "email_on_failure": False,
    "email_on_retry": False,
    "retries": 1,
    "retry_delay": timedelta(minutes=5)
}

def create_external_table():
    source = "S3_CMS_CMISDB_CMSTRFVW"
    ctx = snowflake.connector.connect(
        user=snow_user,
        password=snow_pass,
        account=snow_host,
        database="aflow_test",
        schema=snow_schema
    )

    ctx.cursor().execute("USE ROLE DF_DEV_SYSADMIN_FR;")
    ctx.cursor().execute("USE WAREHOUSE DF_DEV_SERVICE_AIRFLOW_WH;")
    ctx.cursor().execute("USE DATABASE RAW_DB;")
    try:
        path = 'cms/cmisdb/cmstrfvw/dateday=' + datestr
        sql_create = "CREATE OR REPLACE EXTERNAL TABLE " + source + \
            " with location = @stage_devraw/" + path + \
            " auto_refresh = true file_format = (type = parquet);"
        print(f"sql_create: {sql_create}")
        ctx.cursor().execute(sql_create)

    finally:
        ctx.cursor().close()
    ctx.close()


dag = DAG(
    "simple_dag",
    default_args=default_args,
    start_date=airflow.utils.dates.days_ago(1),
    description="generic simple dag",
    schedule_interval="@daily",
    catchup=False,
)


create_external_table_task = PythonOperator(
    task_id='create_external_table',
    python_callable=create_external_table,
    dag=dag,
)

# flow
create_external_table_task

When I execute this DAG, I get the following error Failure using stage area. Cause: [The AWS Access Key Id you provided is not valid.]

The connection for aws_default is setup with a valid aws access and secret key I have variables setup for aws_access_key and and aws_secret_key

Anyone have any suggestions on what I need to do to be able to connect to Snowflake from Airflow

Damien
  • 4,081
  • 12
  • 75
  • 126
  • Why are you not using `SnowflakeOperator` from airflow snowflake provider? – Elad Kalif Feb 17 '22 at 12:59
  • This is code I have inherited that ran on Airflow 1.10.12. Would you recommend looking at the SnowflakeOperator? – Damien Feb 17 '22 at 13:09
  • 1
    You are basically implementing code that airflow already has... Airflow knows to connect to snowflake and runs queries. There is no reason to maintain this code for yourself. – Elad Kalif Feb 17 '22 at 13:30
  • Check also if the issue is permissions to list files from Snowflake https://stackoverflow.com/a/62396251/14624409 – Elad Kalif Feb 17 '22 at 13:35

1 Answers1

1

The errors you get are not really Airflow related - It's coming from Snowflake. Most likely the credential provided are missing List Permission. See this answer for information about it.

Noting also that if you are using Airflow then you can use its power. There is no need to implement code that Airflow already support. You created a PythonOperator that connect to snowflake and runs queries. For this use case simply use SnowflakeOperator.

Example:

from airflow.providers.snowflake.operators.snowflake import SnowflakeOperator
snowflake_op_sql_str = SnowflakeOperator(
    task_id='snowflake_sql',
    snowflake_conn_id = 'my_snowflake_conn'
    sql="Select 1",
    warehouse="my_warehouse",
    database="my_db",
    schema="my_schema",
    role="my_role",
)

Using the operator saves you the trouble of authenticating, using cursor etc... You will need to define Snowflake connection in Airflow. You can use this doc for information about it.

If for some reason the SnowflakeOperator doesn't have the functionality you need then you can create a custom operator or use the SnowflakeHook with in PythonOperator.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49