1

We have several servers set up on Debian 11 (Bullseye). One of the processes we run on all servers at regular interval during the day, with cron, is a perl script that reads from a MySQL database and sends emails with relevant data. The number of emails vary but is usually between 5 and 20 per batch. Except during the night, where another process will compile information and prepare hundreds of emails. Those are usually all sent with the next batch. Except on one of our servers.

We installed this server a while ago and recently added a new client on it. This client generates a higher volume of emails than others. We noticed that this causes issues because on this particular machine, the emails are not sent as quickly as on others. This means that the Perl script will queue the pending emails again and they will start piling up.

The perl script communicates with Exim4 installed locally. The configuration we use is the same on all our servers. Here is the update-exim4.conf.con file:

dc_eximconfig_configtype='smarthost'
dc_other_hostnames=''
dc_local_interfaces='127.0.0.1 ; ::1'
dc_readhost='******.com'
dc_relay_domains=''
dc_minimaldns='false'
dc_relay_nets=''
dc_smarthost='1**.***.***.**6'
CFILEMODE='644'
dc_use_split_config='true'
dc_hide_mailname='true'
dc_mailname_in_oh='true'
dc_localdelivery='mail_spool'

As you can see, Exim will connect to another one of our servers, the smarthost running postfix, where every email is sent to its final destination.

The problem we have is that emails on this particular server will start sending really fast (~20/s) and then slow down, stall, freeze and eventually stop with the following error:

unable to establish SMTP connection to (localhost) port 25

Trace begun at /usr/share/perl5/Email/Sender/Transport/SMTP.pm line 224
Email::Sender::Transport::SMTP::_throw('Email::Sender::Transport::SMTP=HASH(0x55d021f74620)', 'unable to establish SMTP connection to (localhost) port 25') called at /usr/share/perl5/Email/Sender/Transport/SMTP.pm line 174
Email::Sender::Transport::SMTP::_smtp_client('Email::Sender::Transport::SMTP=HASH(0x55d021f74620)') called at /usr/share/perl5/Email/Sender/Transport/SMTP.pm line 233

It is my understanding that Exim will first save emails to disk, then add them to the queue, then send them (in our case) through the smarthost. If this is the case, it means that there might be some connection problem between the perl script and Exim. I have tried many configuration changes, to no avail so far.

When I run Exim from the command line with the -d+all flag, I get a lot of data. The interesting part, to me, is in the gaps:

[...]
12:33:00.959 394267 postfork: daemon-accept-delivery
12:33:00.959 394266 SMTP>>(close on process exit)
12:33:00.959 394266 >>>>>>>>>>>>>>>> Exim pid=394266 (daemon-accept) terminating with rc=0 >>>>>>>>>>>>>>>>
12:33:00.959 394267 exec /usr/sbin/exim4 -d=0xfff9ffff -MCd daemon-accept-delivery -Mc 1pY9HA-001eZ8-TG
12:33:00.960 394250 child 394266 ended: status=0x0
12:33:00.960 394250   normal exit, 0
12:33:00.960 394250 0 SMTP accept processes now running
12:33:00.960 394250 Listening...
[...]
12:33:02.998 394360 search_tidyup called
12:33:02.998 394360 search_tidyup called
12:33:02.998 394360 >>>>>>>>>>>>>>>> Exim pid=394360 (daemon-accept-delivery) terminating with rc=0 >>>>>>>>>>>
12:33:03.063 394365 read response data: size=75
12:33:03.063 394365   SMTP<< 250 2.1.0 Ok
12:33:03.063 394365 sync_responses expect rcpt
12:33:03.063 394365   SMTP<< 250 2.1.5 Ok
12:33:03.063 394365   SMTP<< 250 2.0.0 Ok: 1376 bytes queued as EDDF2F5F5E
[...]
12:33:03.087 394364 DSN: Remote SMTP server supports DSN: 1
12:33:03.087 394364 DSN: not sending DSN success message
12:33:03.087 394364 LOG: MAIN
12:33:03.087 394364   Completed QT=0.385s
12:33:03.087 394364 end delivery of 1pY9HC-001eag-Me
12:33:03.087 394364 search_tidyup called
12:33:03.087 394364 search_tidyup called
12:33:03.088 394364 >>>>>>>>>>>>>>>> Exim pid=394364 (daemon-accept-delivery) terminating with rc=0 >>>>>>>>>>>
**Why a 4 seconds gap here?**
12:33:07.772 394366 DNS lookup of localhost (AAAA) gave HOST_NOT_FOUND
12:33:07.772 394366 returning DNS_NOMATCH
12:33:07.772 394366 faking res_search(AAAA) response length as 65535
12:33:07.772 394366  writing neg-cache entry for localhost-AAAA-800043, ttl 86400
12:33:07.773 394366 DNS lookup of localhost (A) gave HOST_NOT_FOUND
[...]
12:33:08.417 394438 DSN: Remote SMTP server supports DSN: 1
12:33:08.417 394438 DSN: not sending DSN success message
12:33:08.417 394438 LOG: MAIN
12:33:08.417 394438   Completed QT=0.343s
12:33:08.417 394438 end delivery of 1pY9HI-001ebt-2N
12:33:08.417 394438 search_tidyup called
12:33:08.417 394438 search_tidyup called
12:33:08.417 394438 >>>>>>>>>>>>>>>> Exim pid=394438 (daemon-accept-delivery) terminating with rc=0 >>>>>>>>>>>
**Another 4-5 seconds gap**
12:33:13.146 394440 DNS lookup of ip6-localhost (AAAA) gave HOST_NOT_FOUND
12:33:13.146 394440 returning DNS_NOMATCH

The DNS lookup of localhost gave HOST_NOT_FOUND comes back for every message accepted and delivered so it's probably not what's causing the issue.

The gaps between emails become more and more frequent as the number of messages processed grows. The time each gap takes also lenghthens. Eventually there is a timeout.

Here is a summary of the Perl code. Note that the actual code is more complex, with many classes and subclasses, but the gist of it is this:

# Create SMTP server
$smtp_transport = Email::Sender::Transport::SMTP->new( $params );
my $unsent_messages = get_unsent_messages( $messages_args );

foreach my $message ( @$unsent_messages ) {
    # Create email and assign args
    my $email = Email::Stuffer->new();
    $email->to( $message->{to} );
    # [...] other arguments, headers, etc.
    $email->send_or_die( transport => $smtp_transport );
}

What could be causing this issue. How can I debug this and find out where the problem is coming from? The configuration is the exact same as what we have on our other servers, which can hundreds of email in less than a minute so I doubt it's a configuration setting. Could it somehow be a hardware difference?

Using Exim version 4.94.2 #2 built 13-Jul-2021 16:04:57 and Postfix mail_version = 3.4.23.

Update

A simple for loop (see below) sends messages with lightning speed and no delay. So the problem seems to be coming from Perl.

for f in {1..111}; do mail -s TestDeCourriel my@email.com <<< "Mail content"; done

Second update

The same loop with s-nail connecting to the local SMTP server instead of using mail / sendmail can reproduce the issue.

for i in {1..111}; do echo "This is a test message. $(date)" | s-nail -S mta=smtp://localhost:25 -s "Some s-nail mail" my@email.com; done
  • Check for any configured ratelimits - see https://www.exim.org/exim-html-current/doc/html/spec_html/ch-access_control_lists.html#SECTratelimiting – Steffen Ullrich Mar 03 '23 at 18:00
  • Unfortunately there is no ratelimit anywhere in the configuration files. – InLibrolivier Mar 03 '23 at 18:31
  • 1) How many email do you send via one TCP connection? [One Net::SMTP object estsblishes one SMTP connection] Many messages over single connection may lower system load. 2) Do you make Net::SMTP connect `localhost` or `127.0.0.1` ? – AnFi Mar 04 '23 at 06:18
  • I have added the basic structure of the perl code to the question. As far as I can tell, one email is sent for every SMTP connection. Right now it's connecting to `localhost`. Do you think using `127.0.0.1` instead would change something? – InLibrolivier Mar 06 '23 at 14:21

0 Answers0