IMAP scalability

Question

I am creating a system which allows members to email content to it and have it instantly and automatically parsed.

I asked a question regarding using a custom SMTP server for intercepting mail. One of the reasons for this is that it's easy to scale up should demand increase (just setup a new mailserver). One of the suggestions was to use IMAP IDLE instead to monitor a single mailbox.

My question is, how scalable is this? If I had one mailbox with a wildcard email alias to capture all mail, how much mail could it handle at one time without grinding to a halt? If I were to go the route of creating a separate mailbox per member, what would be the best way to monitor each one?

I think your question is very vague for someone to answer it satisfactorily. What is it that you are trying to build? Messages come into your system either via SMTP or IMAP submissions and are store in an appropriate mailbox. What next? Who needs the instant notification? The mailbox "owner"? All users on the system? If I was to simply answer your 3rd paragraph, I would say that it depends on the hardware, IMAP server software, storage format and maximum file size — adamo, Mar 13 '11 at 10:13
@adamo the system I am building, is a generic email-to-application system. When mail arrives, the system should be notified to trigger the automatic parsing. Once parsed, the system then passes the email contents on to the registered application. The system will start with only a handfull of registered apps, handling 100s of emails per day, but this could easily escalate, so I just want to make sure I don't back myself into a corner that won't scale. — Matt Brailsford, Mar 13 '11 at 10:42

score 1 · Answer 1 · answered Mar 13 '11 at 10:07

1

Would you be clearing out the mail once you'd parsed it? If so, having multiple mailboxes on the same server probably wouldn't give you any performance benefit, as the mail files would probably end up being stored on the same volume anyway.

What sort of volume of incoming mail are you expecting this system to handle? The bottleneck will almost certainly be your parsing code rather than the mailserver itself. There's no reason why a single IMAP server shouldn't handle thousands of messages a minute.

NB: We handle support tickets using a single mailbox which is polled every minute using POP3. Each message is parsed for account number / support token and dumped into a database, and it easily handles thousands every hour.

Your biggest problems will probably lie with spam prevention/rejection and mail-bombs.

answered Mar 13 '11 at 10:07

Steve Mayne

1,001
6
5

my thoughts are (baring in mind, my app will probably be on a different server to the mail server) when the app connects, it downloads the mail and saves it to the filesystem, removing the one from the mailbox. My only reasoning being, it should be quicker for the app to deal with the filesystem than a remote mailserver? – Matt Brailsford Mar 13 '11 at 10:46
in terms of volumes, I'm expecting an initial range of around 1000 emails per day, but are expecting this to grow significantly (I don't have any solid projections though). I think the bottleneck of parsing might be the area I'm trying to get at. If the system grows, realistically how much mail could you handle before you start noticing delays? And would multiple mailboxes negate this? – Matt Brailsford Mar 13 '11 at 10:53
By having multiple mailboxes it would give you an easy way to devote different parsers to different boxes, but I think you might be trying to over-optimise at the moment. If you're only expecting 1000 messages per day to start with, I'd go with the simple solution (one mail server / one parser) then optimise when the need arises. – Steve Mayne Mar 13 '11 at 11:22
You can use www.slamd.com to test the load your system can handle. – adamo Mar 13 '11 at 20:24

score 1 · Answer 2 · answered Mar 13 '11 at 20:22

From what I understand you are about to write an SMTP listener which depending on the value of the RCPT TO: command is going to accept or reject the connection. Upon accepting a connection it is going to parse the data accompanying the DATA command and the parser's result is going to signal the relevant application that is going to use the data, right?

Why do you need any physical mailbox at all? Upon each connection your listener spawns a handler child which parses and notifies the registered application. Depending on your load you may or may not need to maintain a queue locally (or send a 4xx error to the sender so that they retry delivery) or you may need to maintain a queue for each registered application. Assuming one mailbox per application, what you are thinking is like using each mailbox as a separate queue for each application. Using IMAP to determine mailbox change would simple mean that you have to write yet another listener (an IMAP one) that would check the mailboxes. IMHO, this is unnecessary overkill.

I do not write .NET but here is what I would do: I would have each application listen on a port (determined upon registration with the SMTP listener). Then when the parser has decided where data is to be forwarded, the handler-child should connect to the relevant port and forward the data to the application. In your case such a solution might be unacceptable, especially if a major rewrite is needed.

You may find helpful advice reading "Programmer's Guide to Internet Mail" even though its code samples are in Visual Basic.

Well, this is my reason for asking really. I was planning to write a SMTP listener, as (like you said) I don't really need a mailbox, but the guys on the other thread said it was too dangerous, and I should use a mailbox instead. Thanks for the link to the book, I'll definatley go give it a read. — Matt Brailsford, Mar 13 '11 at 20:41
What you will need at some point in time, is queue management. A mailbox is a cheap way of doing so in your case. — adamo, Mar 13 '11 at 21:18

score 1 · Answer 3 · answered Mar 13 '11 at 20:55

There is a sort-of-standard protocol called LMTP (see RFC 2033) which allows you to implement a mail processor that receives mail from an ordinary SMTP frontend. The frontend will provide hardening for you and will do the queue management. Your LMTP server will receive mails and do parsing and notification.

The advantage of this solution is that you can stick a load balancer in between and add as many boxen running your LMTP listener as you need.

I think both Sendmail and Postfix supports LMTP. Your favourite server may well.

+1 for suggesting LMTP. However, if you check the other SF post, the implementation is going to be on Windows .NET. This make integrating Postfix or Sendmail in the solution a little bit more complex. — adamo, Mar 13 '11 at 21:01

score 0 · Answer 4 · answered Jan 15 '12 at 20:55

Have you considered using two SMTP servers? First, expose a standard smtpd (like Postfix) to the internet. Have it configured to accept only mail to the correct addresses, etc. Then have it pass messages to a smtpd that you write, which isn't exposed externally. You can rely on the Postfix instance to do queue management, filtering, etc. Since your daemon would also only be talking to a trusted Postfix instance, you wouldn't have to worry as much about a bug in it being externally exploitable. (Of course, you should write paranoid code anyway.)

Another (probably better) option would be to have Postfix deliver to a local process that you write, which passes messages to your backend server using a simpler protocol than SMTP.

Either one should be pretty easily scalable. You can add extra proxy servers using multiple MX records, and extra backend servers with either (depending on how you configure Postfix to talk to them) another set of MX records, and DNS round robin, or maybe something else.

IMAP scalability

4 Answers4