2

I am currently using Airflow 1.8.2 to schedule some EMR tasks and then execute some long running queries on our Redshift cluster. For that purpose I am using the postgres_operator. The queries take about 30 minutes to run. However, once they are done, the connection never closes and the operator runs for an hour and a half more till its terminated at the 2 hour mark every time. The message on termination is that the server closed the connection unexpectedly.

I've checked the logs on Redshift's end and it shows the queries have run and the connection has been closed. Somehow, that is never communicated back to Airflow. Any directions of what more I could check would be helpful. To give some more info, my Airflow installation is an extension of the https://github.com/puckel/docker-airflow docker image, is run in an ECS cluster and has SQLite as backend since I am still testing Airflow out. Also, I'm using the sequential executor for the backend. I would appreciate any help in this matter.

shomo
  • 23
  • 4

1 Answers1

3

We had similar issue before but I am using SQLAlchemy to Redshift, if you are using postgres_operator, it should be very similar. It seems Redshift will close the connection if it doesn't see any activity for a long running query, in your case, 30 mins are pretty long query.

Check https://www.postgresql.org/docs/9.5/static/runtime-config-connection.html you have three settings, tcp_keepalives_idle, tcp_keepalives_idle, tcp_keepalives_count, that sends a live message to redshift to indicate "Hey, I am still alive.

You can pass the following as argument, so something like this: connect_args={'keepalives': 1, 'keepalives_idle':60, 'keepalives_interval': 60}

Chengzhi
  • 2,531
  • 2
  • 27
  • 41
  • Thank you for your answer. I ended up defining a custom operator and hook that lets me override the keepalives parameter. – shomo Sep 27 '17 at 23:10