Sendmail 12 day delay after Deferred: Connection refused

Question

I have noticed a few emails have a massive delay (up to 12 days) after being deferred "Deferred: Connection refused"

Here is an example:

Nov  4 09:29:52 server0004 sendmail[4789]: sA49TLmW004726: to=<** Recipient Deleted **>, delay=00:00:31, xdelay=00:00:00, mailer=esmtp, pri=1806434, relay=mx01.inthehive.net. [109.68.0.70], dsn=4.0.0, stat=Deferred: Connection refused by mx01.inthehive.net.
Nov 16 12:11:44 server0004 sendmail[32326]: sA49TLmW004726: to=<** recipient Deleted ** >,  **delay=12+02:42:23**, xdelay=00:00:01, mailer=esmtp, pri=1896434, relay=mx01.inthehive.net. [109.68.0.70], dsn=2.0.0, stat=Sent (B546894210001 Message accepted for delivery)

/etc/sysconfig/sendmail contains:

DAEMON=yes
QUEUE=1h

From what I have read the QUEUE option configures sendmail to retry messages in the mailq to be retried every hour.

There are no other messages in the logfile for this message. I am struggling to work out why this waited for so long before being re-tried.

We are running Centos 6.5, sendmail-8.14.4-8.el6.x86_64 (This is a really old version, we are currently testing the latest version with our app)

Any pointers would be most welcome

EDIT: I should mention that most deferred emails do seem to send in a timely manner, we are sending ~160,000 emails per week from this server

EDIT: There are other entries of message aS49TLmW004726 in the sendmail logs between the 1st attempt and final delivery:

Nov  4 09:29:57 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>, delay=00:00:36, xdelay=00:00:05, mailer=esmtp, pri=1806434, relay=mx3.mail.icloud.com. [17.158.8.50], dsn=2.0.0, stat=Sent (Ok, envelope id 0NEI00IHCCZ30H20@nk11p00mm-smtpin005.mac.com)
Nov  4 09:30:00 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>, delay=00:00:39, xdelay=00:00:03, mailer=esmtp, pri=1806434, relay=mailin1.rmx.de. [217.111.120.6], dsn=2.0.0, stat=Sent (Ok: queued as 04CFC180964)
Nov  4 09:30:02 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **k>, delay=00:00:41, xdelay=00:00:02, mailer=esmtp, pri=1806434, relay=mx.paulandsam.co.u...a.hostedemail.com. [216.40.42.4], dsn=2.0.0, stat=Sent (Ok Queued as wish11_2c048f6822d39)
Nov  4 09:30:06 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>,<** Address Removed **>,<** Address Removed **>,<** Address Removed **>, delay=00:00:45, xdelay=00:00:04, mailer=esmtp, pri=1806434, relay=mx-eu.mail.am0.yahoodns.net. [188.125.69.79], dsn=2.0.0, stat=Sent (ok dirdel 4/0)
Nov  4 09:30:06 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>,<** Address Removed **>, delay=00:00:45, xdelay=00:00:00, mailer=esmtp, pri=1806434, relay=mx-c1.talktalk.net. [62.24.202.3], dsn=2.0.0, stat=Sent (ok:  Message 702806395 accepted)
Nov  4 09:30:07 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>, delay=00:00:46, xdelay=00:00:01, mailer=esmtp, pri=1806434, relay=mxgb1.opaltelecom.net. [62.24.139.61], dsn=2.0.0, stat=Sent (ok:  Message 664665974 accepted)
Nov  4 09:30:09 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>, delay=00:00:48, xdelay=00:00:01, mailer=esmtp, pri=1806434, relay=mailserver.ukdpi.co.uk. [213.171.216.114], dsn=2.0.0, stat=Sent (Ok: queued as 9ABDFD60281)
Nov  4 09:30:11 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>, delay=00:00:50, xdelay=00:00:02, mailer=esmtp, pri=1806434, relay=aspmx.l.google.com. [74.125.195.26], dsn=2.0.0, stat=Sent (OK 1415093312 t19si273668wiv.66 - gsmtp)
Nov  4 09:30:11 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>,<** Address Removed **>, delay=00:00:50, xdelay=00:00:00, mailer=esmtp, pri=1806434, relay=mx01.inthehive.net., dsn=4.0.0, stat=Deferred: Connection refused by mx01.inthehive.net.
Nov  4 09:30:13 server0004 sendmail[4789]: sA49TLmW004726: to=<** Address Removed **>,<** Address Removed **>,<** Address Removed **>, delay=00:00:52, xdelay=00:00:02, mailer=esmtp, pri=1806434, relay=mx-eu.mail.am0.yahoodns.net. [188.125.69.79], dsn=2.0.0, stat=Sent (ok dirdel 3/0)

We send ~ 15 messages per week to 109.68.0.70

Are those the only two mentions of message `sA49TLmW004726` in the logfile? Or did sendmail try to deliver in the interim, and can we see some of those attempts as well? — MadHatter, Dec 01 '14 at 18:04
How many of those 160k mails/week are to mx01.inthehive.net (or better the ip it resolves to)? — r_3, Dec 01 '14 at 18:46
@NathanFoley thanks for that. This looks like a single email to a very large number of recipients, and those other lines look like the other recipients getting their copies within minutes of the mailing going out. What I'm really interested in is (some of) the other lines where delivery is tried to `inthehive.net` and fails, between Nov 4 and 16. — MadHatter, Dec 02 '14 at 09:47
@MadHatter Thanks for your reply. Yes, you are correct, looks like this email was sent to 50 recipients. Here are some of the other failures between 4/11 - 16/11 : Nov 4 12:52:09 server0004 sendmail[3811]: sA4Cq8dP003803: to=<** receipient removed **>, delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=124654, relay=mx01.inthehive.net. [109.68.0.70], dsn=4.0.0, stat=Deferred: Connection refused by mx01.inthehive.net. — Nathan Foley, Dec 02 '14 at 13:12
Nov 10 09:13:16 server0004 sendmail[905]: sAA99fhr000807: to=<** receipient removed **>,<** receipient removed **>, delay=00:03:35, xdelay=00:00:00, mailer=esmtp, pri=5251070, relay=mx02.inthehive.net. [109.68.0.71], dsn=4.0.0, stat=Deferred: Connection refused by mx02.inthehive.net. — Nathan Foley, Dec 02 '14 at 13:12
Nov 12 11:08:51 server0004 sendmail[20613]: sACB8pjC020602: to=<** receipient removed **>, delay=00:00:00, xdelay=00:00:00, mailer=esmtp, pri=125214, relay=mx02.inthehive.net. [109.68.0.71], dsn=4.0.0, stat=Deferred: Connection refused by mx02.inthehive.net. — Nathan Foley, Dec 02 '14 at 13:14
Nathan, please edit these into your question; they're pretty much impossible to read with the loss of formatting that comments involve. Thanks! — MadHatter, Dec 02 '14 at 13:20

MadHatter · Answer 1 · 2014-12-05T15:01:51.140

3

Your question says you are "struggling to work out why this waited for so long before being re-tried", but it seems to me that those logs are definitive: your mail server isn't waiting 12 days before re-trying sending. It's retrying it quite often (though without all the logs I can't say how often), but the remote server is being consistently unavailable. So your question is unanswerable, because it's based on false predicates.

I, too, wonder why the remote server wasn't taking your calls, but assuming that you didn't have a complete failure of internet connection - which seems improbable - we have no way of knowing. It might be routing flaps, but that's beyond your control; it might be that the remote server doesn't like getting so much mail from you, so it's refusing to answer during busy periods, but that's also beyond your control; it might be that their server is simply overloaded (sendmail, for example, refuses connections when system load gets high), but again, that's beyond your control.

You might wish to try to communicate directly with the admin of that server, to put the question to him/her; but apart from that there's not much you can do - certainly, there doesn't seem to be much wrong with your server.

edited Dec 05 '14 at 15:01

answered Dec 02 '14 at 13:26

MadHatter

79,770
20
184
232

This 'answer' seems unnecessarily condescending; particularly disappointing for a mod, IMO. [link](http://serverfault.com/help/be-nice) – BE77Y Dec 05 '14 at 14:16
Firstly, I'm not a mod. Secondly, I'm sorry about that; the underlying issue that I hoped to get across is that the OP is misunderstanding the log entries and that most of their question is moot; inasmuch as there is a problem, it's not their problem. If you can clarify which parts strike you as condescending, I'll attempt to improve them. – MadHatter Dec 05 '14 at 14:24
Fair enough, my mistake on the mod status. I also agree with your conclusion, but as above - the communication of the point seemed a bit unnecessarily condescending - in particular your reference to the 'length of a piece of string', which in my view does nothing to further the point but is detrimental to conveying respect to OP. – BE77Y Dec 05 '14 at 14:32
@BE77Y I agree with you. I hope the above strikes you as a bit more sympathetic. – MadHatter Dec 05 '14 at 15:02
It does indeed - vastly improved, in my view. I appreciate you editing it thusly (and am happy to remove these comments which I suppose could be viewed as off-topic, if deemed necessary). – BE77Y Dec 05 '14 at 15:16
@BE77Y glad to hear it, and thank you for your feedback. I'd suggest that we leave the comments; I think it's good to show the use of civilised critical discussion in improving a posting, and any mod can remove the comments if (s)he sees fit. But do feel free to upvote the answer, if you'd like :-) – MadHatter Dec 05 '14 at 15:27
A fair point - in which case I will leave my comments as you suggest, to the discretion of the mods. – BE77Y Dec 05 '14 at 15:41
@MadHatter Thanks for your comments and apologies for the delay in responding. Perhaps I am misunderstanding the logs, for message sA49TLmW004726, I would have expected to see a retry every hour in the sendmail logs. Is this not the case? My original question was based on this, the message seemed like it had been attempted once but then not re-tried for many days, rather than the remote server refusing it every hour – Nathan Foley Dec 26 '14 at 15:13
@NathanFoley from what you've told us, we don't know that we're not seeing that. Could you edit into your question the result of `grep sA49TLmW004726 /var/log/maillog | grep 109.68.0.70`? – MadHatter Dec 28 '14 at 17:42

score 1 · Answer 2 · answered Dec 02 '14 at 15:44

I had a similar issue recently with one of the domains we relay for. There was a single message in the queue for that domain which caused their gateway to hang and not accept the message. This arrived while their gateway was unavailable for maintenance (so we built up a queue of messages for them before they were back in service).

On subsequent queue runs the duff message was tried, timed out and other messages queued for that domain were showed as retried without actually being tried (showed with xdelay=00:00:00). Once the domain was back in service new messages were delivered immediately but old messages stuck behind the duff one didn't really get retried and eventually were returned.

What we've done to work round this is add some regular queue runs with different ordering ...

# Puppet Name: queuerun-modtime
10 * * * * /usr/lib/sendmail -q -OQueueSortOrder=modtime
# Puppet Name: queuerun-random
40 * * * * /usr/lib/sendmail -q -OQueueSortOrder=random

Sendmail 12 day delay after Deferred: Connection refused

2 Answers2