4

It works fine up until the moment the remote server becomes unavailable for some time. In which case the server goes down in the logs and is never brought up again. Config is quite simple:

defaults
    retries 3
    timeout connect 5000
    timeout client 3600000
    timeout server 3600000
    log global
    option log-health-checks

listen amazon_ses
    bind 127.0.0.2:1234
    mode tcp
    no option http-server-close
    default_backend bk_amazon_ses

backend bk_amazon_ses
    mode tcp
    no option http-server-close
    server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 30s fall 120 rise 1

Here are the logs when the problem occurs:

Jul  3 06:45:35 jupiter haproxy[40331]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30004ms, status: 119/120 UP. 
Jul  3 06:46:35 jupiter haproxy[40331]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30003ms, status: 118/120 UP. 
Jul  3 06:47:35 jupiter haproxy[40331]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30002ms, status: 117/120 UP.
...
Jul  3 08:44:36 jupiter haproxy[40331]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30000ms, st
atus: 0/1 DOWN. 
Jul  3 08:44:36 jupiter haproxy[40331]: Server bk_amazon_ses/amazon is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 
0 remaining in queue. 
Jul  3 08:44:36 jupiter haproxy[40331]: backend bk_amazon_ses has no server available!`

And that's it. Nothing but manual server reload (service haproxy reload on FreeBSD) brings the server back to life. I also tried removing the check part and what follows it - still the same thing happens. Can't haproxy be configured to try indefinitely and not mark a server DOWN? Thanks.

Rihad
  • 41
  • 1
  • 3
  • Have you tried using lower timeouts? Also using something like this? ` server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 5s fall 3 rise 2`. Also try adding these options to your backend: – Bogdan Stoica Jul 10 '18 at 05:08
  • Could be a DNS resolution issue, see https://stackoverflow.com/questions/45153680/ – augurar Mar 01 '19 at 19:24
  • I don't expect the cause to be DNS as trying to access the proxied external server via haproxy from telnet just blocks & times out, while accessing it directly works. For now I've set up a cron job that checks the log file for the presence of health check lines and does a reload (-sf) every 5 minutes if found. – Rihad Mar 02 '19 at 06:28

2 Answers2

1

Have you tried using lower timeouts? Also using something like this? \

server amazon email-smtp.us-west-2.amazonaws.com:587 check inter 5s fall 3 rise 2

Also try adding these options to your backend:

mode tcp
option tcplog
option log-health-checks

and then check your haproxy logs again, see if you get any additional info.

You could also try, manually, from the haproxy box to telnet on that server on port 587 and see if you can connect when haproxy reports it as down? If you can't connect via telnet, then it's pretty normal for haproxy to report it down.

Is there any rate limiting, firewalling or any other similar configuration on your SMTP box that might block the haproxy server?

Bogdan Stoica
  • 403
  • 4
  • 9
  • 1
    Yes, we used lower timeouts before, the whole point of increasing them was not to get into the "permanently down" condition. When the server finally becomes reachable it stays down from the POV of haproxy, which isn't normal. In other words, when you see lines such as: Jul 3 08:44:36 jupiter haproxy[40331]: Server bk_amazon_ses/amazon is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. Jul 3 08:44:36 jupiter haproxy[40331]: backend bk_amazon_ses has no server available!` then say goodbye to it, it will never be up again. – Rihad Jul 10 '18 at 09:22
  • Did you try to telnet from the haproxy server to the smtp server on port 587 and see if it works or not while haproxy reports the server as down ? – Bogdan Stoica Jul 10 '18 at 09:24
  • Yup, not reachable, connection is immediately closed. – Rihad Jul 10 '18 at 09:26
  • Well if telnet is not working then it's not a haproxy issue. – Bogdan Stoica Jul 10 '18 at 09:26
  • 1
    Sorry, I misread your q'n, when the SMTP finally starts working it of course is reachable by telnet. The immediate connection closing condition I described is when telnetting to haproxy at that time. – Rihad Jul 10 '18 at 09:36
  • Ok, right now the server is reachable, but haproxy health checks fail on it... – Rihad Jul 11 '18 at 11:15
  • $ telnet email-smtp.us-west-2.amazonaws.com 587 Trying 54.149.207.7... Connected to ses-smtp-us-west-2-prod-14896026.us-west-2.elb.amazonaws.com. Escape character is '^]'. 220 email-smtp.amazonaws.com ESMTP SimpleEmailService-2761973385 MBzXGl7jKNJqISNUXUpq ^] telnet> quit Connection closed – Rihad Jul 11 '18 at 11:15
  • Log: Jul 11 15:14:56 venus haproxy[1495]: Health check for server bk_amazon_ses/amazon failed, reason: Layer4 timeout, check duration: 30000ms, status: 840/1440 UP – Rihad Jul 11 '18 at 11:15
  • It's hard to say really without being able to actually check on the machine... So I really don't know what could cause that kind of behavior – Bogdan Stoica Jul 11 '18 at 11:16
  • I had to reload manually to get it to work. – Rihad Jul 11 '18 at 11:18
0

I had the same error with a HAProxy version 1.8. I don't understand what happened, but I restarted the haproxy service and the problem was resolved.

Server xxxx is DOWN, reason: Layer4 timeout, check duration: 1001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Aug 25 16:10:18 localhost haproxy[5749]: backend yyyyy has no server available!