0

I am learning airflow and as a practice exercise im trying to create a table at Redshift through an airflow dag at MWAA. I create the connection to Redshift at the UI (specifying host,port, etc) and run the following dag, but it fails at the "sql_query" task. Any idea of how can I solve this problem or what can be causing it?

Script:

import os
from datetime import timedelta

from airflow import DAG
from airflow.models import Variable
from airflow.models.baseoperator import chain
from airflow.operators.dummy import DummyOperator
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
from airflow.utils.dates import days_ago


DEFAULT_ARGS = {
   "owner": "username",
   "depends_on_past": False,
   "retries": 0,
   "email_on_failure": False,
   "email_on_retry": False,
   "redshift_conn_id": "redshift_default", 
}

with DAG(
   dag_id= "new_table_dag",
   description="",
   default_args=DEFAULT_ARGS,
   dagrun_timeout=timedelta(minutes=15),
   start_date=days_ago(1),  
   schedule_interval=None,
   tags=[""],
) as dag:
   
   begin = DummyOperator(task_id="begin")

   end = DummyOperator(task_id="end")
   
   sql_query = RedshiftSQLOperator(
       task_id="sql_query", 
       sql= "CREATE TABLE schema_name.table_a AS (SELECT * FROM table_b)")

chain(begin,sql_query, end)
  • Any more details about how `sql_query` fails? Also, have you looked at this [example_redshift_sql.py](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/example_dags/example_redshift_sql.py)? – PApostol Jul 05 '22 at 19:07
  • Please add traceback of the failure from the log – Elad Kalif Jul 05 '22 at 19:16
  • There's an indentation error in the code. `chain(begin,sql_query, end)` needs to be indented within the `with DAG` context manager in order for the operators to be properly chained. – Andrew Nguonly Jul 06 '22 at 11:41
  • Thanks for the answers! I solved the indentation error in the code but still not working, I also checked the logs but didnt find any details. Any other idea? – d4t4beaver Jul 06 '22 at 14:44
  • Have you confirmed that the connection from MWAA to Redshift is configured at the Airflow and network layers? For example, is the `redshift_default` connection configured (Airflow)? Is the Redshift security group configured to allow ingress from MWAA's security group (network)? – Andrew Nguonly Jul 13 '22 at 11:53

1 Answers1

0

It is hard to tell from the info you have given, but here are some things to try out:

1/ Most of the issues I have seen when trying to interact with Redshift from MWAA is around the VPC/Security group changes. Have you added an Inbound traffic rule for Redshift for the MWAA security group?

2/ The second thing I see is failing to add the required additional IAM permissions to the MWAA execution role. From the MWAA console you can easily find these and check to see whether the policy includes access to RedShift

3/ The final thing is around the version of the Amazon provider package and the Operator you are using. You are using the RedshiftSQLOperator in your dag above, but have you looked at potentially using RedshiftDataOperator instead?