2

I have some basic questions regarding SMTP connections, particularly for Postfix but would appreciate advice for MS Exchange too:

  • When a single relay for a given remote domain becomes unavailable, how frequently does a postfix mta that is sending mail to that domain check back to see if the service is available?
  • Are failed connections "remembered" (ie cached) at all?
  • Does postfix ever check to see if "lower priority" relays are online while a "higher priority" relay is available?
  • Are either of the above tunable in postfix?
Andy
  • 1,111
  • 1
  • 7
  • 10
  • What do you mean by 'remember failed connection'? – masegaloeh May 12 '15 at 11:12
  • To attempt some parallels - sssd runs a "service_check_alive" heartbeat in the background periodically. Does postfix simply attempt to blindly deliver every email expecting the service to be up, and failing over if its not? Or can it keep track of the availability of a given relay/domain/mx target? – Andy May 12 '15 at 12:19
  • If the mx records for example.com are m1.example.com (crashed) and m2.example.com(available), of equal priority, and my local mta were to send 100 emails to that domain, could i expect 50 failed connections to m1.example.com? Or just periodic failed attempts after one failed connection? – Andy May 12 '15 at 12:27

1 Answers1

2

Disclaimer: this answer only use some postfix documentation in the internet, so perhaps I missed some facts here. For better and precise documentation, feel free to post the question in postfix mailing-list (Wietse Venema is active person in that list) or look in postfix source code.

All above question is focused on postfix backoff algorithm. First, I'll try to address the dead destionation issue

Here the relevant portion of man 8 qmgr

STRATEGIES
   The queue manager implements a variety of strategies for either opening
   queue files (input) or for message delivery (output).
   ...
   destination status cache
          The queue manager avoids unnecessary delivery attempts by  main-
          taining  a  short-term,  in-memory  list of unreachable destina-
          tions.

Based on above, postfix indeed has cache for dead destination host. To control this behavior, please adjust qmgr_message_recipient_limit parameter.

qmgr_message_recipient_limit (default: 20000)

The maximal number of recipients held in memory by the Postfix queue manager, and the maximal size of the short-term, in-memory "dead" destination status cache.

So when postfix check if host is alive? Postfix only try to connect to the host if there are message in active queue which has destination on it. Other than that, postfix won't actively checks if the host come back alive.

Postfix only checks lower priority MX host, if primary host is unavailable or postfix get error code 4xx from the remote host. Other MTAs behave differently when they get 4xx-error code, as they may never try to connect to secondary MX host as long as MTA can connect to primary host. See: postfix destination full/busy/error try another destination and Exchange don't send email to second MX


Regarding the message retry time, postfix will put message to deferred queue if it fails to deliver it until bounce_queue_lifetime (for bounce generated by postfix) or maximal_queue_lifetime (for the others). As said above, postfix only redeliver again if qmgr put it in active queue. Here the relevant excerpt from postfix docs about postfix scheduling algorithm.

Each deferred queue scan only brings a fraction of the deferred queue back into the active queue for a retry. This is because each message in the deferred queue is assigned a "cool-off" time when it is deferred. This is done by time-warping the modification time of the queue file into the future. The queue file is not eligible for a retry if its modification time is not yet reached.

The "cool-off" time is at least $minimal_backoff_time and at most $maximal_backoff_time. The next retry time is set by doubling the message's age in the queue, and adjusting up or down to lie within the limits. This means that young messages are initially retried more often than old messages.

So, if you want to tuning retry time, feel free to play with parameter minimal_backoff_time, maximal_backoff_time and queue_run_delay.

masegaloeh
  • 18,236
  • 10
  • 57
  • 106
  • Thanks @masegaloeh, that's very helpful. From the qmgr link, i suspect transport_retry_time is the setting I'm mainly concerned about, though I'll work my way through all of those you've suggested. – Andy May 15 '15 at 13:23
  • Transport and destination is different term in postfix. Broken transport is like broken car, but broken/dead destination is like the road is blocked. `transport_retry_time` only retry if the 'car' is broken, not if the 'road' is blocked – masegaloeh May 15 '15 at 13:54