4

Background

We have a worker application server that does long running reporting export jobs. Since they are export jobs we connected it to a (managed, not serverless) Aurora database cluster with a write master and read replicas that auto-scale with the following scaling policy: enter image description here

The worker uses the read endpoint that comes out-of-the-box with the db cluster that should be distributing load evenly on the existing readers.

Problem

We noticed that the export jobs that are attempting to connect to the read db cluster are failing with this error:

SQLSTATE[HY000]: General error: 7 SSL SYSCALL error: EOF detected

we were able to verify that this error happens exactly when autoscaling happens b/c we cross referenced the timing of the errors:

enter image description here

to make the problem worse, all subsequent export attempts fail with the same error (even after the auto-scaling is over!). The only way were able to fix this is by restarting the worker application servers.

Question

What can we do to let our database cluster gracefully handle incoming db connections to the read replicas while it's scaling? Or how do we force the worker to re-establish a new connection if it finds that the current one is terminated?

Hussein Hijazi
  • 439
  • 4
  • 13
abbood
  • 23,101
  • 16
  • 132
  • 246
  • I'm not sure if this would solve your issue, but mine was some plv8 code running in a function/trigger that didn't seem to play nice after upgrading to Postgres 13. My issue was still intermittent, but removing the trigger and calls to the function in my application got me around the issue. – Steve Robbins Jun 02 '22 at 00:00

0 Answers0