8

I have built an application to send email mailers for a website through Amazon SES. It is coded in C#.

Each email takes .3 seconds to send via the Amazon SES API. That means, using a single-threaded application, I can only send 3 emails per second.

I have implemented a producer/consumer, multi-threaded application with 1 producer to query customize the emails for each customer, and 25 consumers to pull from the queue and send the emails.

My multi-threaded application sends 12 emails per second (a quadruple speed increase). I would have expected a greater speed increase from a 25-thread application.

My question is: How much can I really speed up the sending of a mailer on a single-processor machine? Do my gains seem reasonable, or is my speed problem more likely due to coding than to the computer's inability to process the emails mroe quickly?

Thanks in advance!

UPDATE: In case others are facing the same issue.... connecting to AWS in order to send the email takes up a lot of time. The following thread on AWS Developer forums gives some insight (You may need to scroll down to get to the more useful posts).

https://forums.aws.amazon.com/thread.jspa?threadID=78737

Rebecca
  • 577
  • 4
  • 11
  • From what I understand about multithreading, it can process multiple tasks, but still take the same amount of time. SO I dont think there would be a significant time saving using Multithreading. However multithreading can be used if a UI still needs to be accessable when the emails are sending. – craig1231 Jan 01 '12 at 23:13
  • How many cores do you have in your system? – Tudor Jan 01 '12 at 23:14
  • What happens to the queue count during a typical mailshot? It might be interesting to see whether the producer thread is out-performing the emailers or no. Dump the P-C queue count to the screen on a timer, every second perhaps. – Martin James Jan 02 '12 at 02:36
  • @MartinJames Yes, the Producer thread is way outperforming the consumers... the producer finishes formatting the emails after only a few emails have been sent. – Rebecca Jan 03 '12 at 16:52

8 Answers8

4

You can speed up very much even though it's single-processor machine.

Sending an Email does not consume a lot of CPU, it's an IO bound operation. Therefore you will increase your performance very much by doing the work in parallel.

Maxim
  • 7,268
  • 1
  • 32
  • 44
  • +1 for big speedup on even single-processor box with many threads I/O waiting. – Martin James Jan 02 '12 at 02:27
  • @SurjitSamra - Not sure what are you asking. He uses some Amazon API which I'm not familiar with. – Maxim Jan 02 '12 at 22:56
  • How can you deny your own statement ? Read last line of your own answer you says " you will increase your performance very much by doing the work in parallel." So I am asking how do you suggest to do that ? – Surjit Samra Jan 03 '12 at 09:20
  • @SurjitSamra - In my post I don't propose any implementation. Anyway, doing things in parallel may be done in many ways such as - .NET TPL, creating new threads, using ThreadPool, calling Async implementation of a method (BeginXXX, EndXXX), Event base pattern – Maxim Jan 03 '12 at 13:46
  • -1 => You your self is suggesting TPL now , after marking my answer as downvote. – Surjit Samra Jan 03 '12 at 15:45
  • 1
    Speed was definitely increases by using multiple threads in parallel, as mentioned in the original post. The application is 4x as fast now, but I would have assumed a greater increase.. no? – Rebecca Jan 03 '12 at 16:53
  • @Rebecca - How many emails are you sending? How fast the producer produces them? How fast is your network? It depends on so many factors so it's hard to give you numbers. You'll have to try yourself and see what works best for you. – Maxim Jan 03 '12 at 21:38
3

I blogged about my solution. Basically you use a Parallel.ForEach loop with a MaxDegreeOfParallelism, don't forget to increase the maxconnection count in app.config.

Below is the app.config sample:

<system.net>
    <connectionManagement>
        <add address="*" maxconnection="392" />
    </connectionManagement>
    <mailSettings>
        <smtp from="form@company.com" deliveryMethod="Network">
            <network host="email-smtp.us-east-1.amazonaws.com" userName="SmtpUsername" password="SmtpPassword" enableSsl="true" port="587" />
        </smtp>
    </mailSettings>
</system.net>

And here is the Parallel.ForEach loop sample:

class Program
{
    static readonly object syncRoot = new object();
    private readonly static int maxParallelEmails  = 196;

    static void Main(string[] args)
    {

        IList<Model.SendEmailTo> recipients = _emailerService.GetEmailsToSend();
        int cnt = 0;
        int totalCnt = recipients.Count;


        Parallel.ForEach(recipients.AsParallel(), new ParallelOptions { MaxDegreeOfParallelism = maxParallelEmails }, recipient =>
        {
            // Do any other logic

            // Build the email HTML

            // Send the email, make sure to log exceptions

            // Track email, etc

            lock (syncRoot) cnt++;
            Console.WriteLine(String.Format("{0}/{1} - Sent newsletter email to: {2}", cnt, totalCnt, recipient.Email));
        });
    }
}

My blog explains it in more detail: http://michaeldimoudis.com/blog/2013/5/25/reliably-and-speedily-send-mass-emails-via-amazon-ses-in-c

dimoss
  • 1,506
  • 2
  • 16
  • 24
2

My question is: How much can I really speed up the sending of a mailer on a single-processor machine? Do my gains seem reasonable, or is my speed problem more likely due to coding than to the computer's inability to process the emails more quickly?

Broadly speaking, a 4x speedup for a 25x increase in thread counts isn't outrageous, but it's not great, either.

A single CPU will only become a bottleneck when your CPU usage is high. You can tell whether that's an issue for you by looking at total CPU use when your app is running. In theory, sending bulk emails should be an I/O limited operation; if that's not the case for you, then your code may have issues.

Although I haven't used Amazon SES, I know that other Amazon products definitely use various forms of bandwidth / request throttling. It's possible (likely) that your throughput is being limited more by Amazon than by your app.

I wrote a high-performance bulk mail app a while back, and what I did was:

  1. Used async I/O as much as possible, in addition to multiple threads. That way, if one request is slow, it doesn't consume an entire thread.
  2. Sent the email directly to the end servers, rather than through an intermediate gateway. That required using P/Invoke to call DNS to retrieve the requisite MX or A records. After that, I used the standard SmtpClient class (which has a SendAsync method) to actually send the mail.

This approach also lets me see and record errors when sending the mail, which in turn provides better feedback to the users. The alternative is to rely on receiving and parsing error mail from the gateway server, which is error-prone, to say the least.

RickNZ
  • 18,448
  • 3
  • 51
  • 66
  • +1 - So far, this is the answer that best addresses the question and provides the most detailed and specific information. Thanks! Will try some of these suggestions and come back to this thread to post the results! – Rebecca Jan 03 '12 at 16:58
1

In a multithreaded application running on a multi-core (or multiprocessor) system the golden rule is that (generally) you cannot achieve a better speedup than N times the sequential execution time, where N is the number of cores. So if you have an activity taking 12 seconds and you run it in parallel on 4 cores, you cannot do better than 3 seconds in total.

Conversely, if previously you could execute one activity in a single unit of time, with 4 cores you cannot do better then 4 activities in the same unit of time.

Furthermore, this upper bound is not always achieved due to several factors that generally impact the performance of parallel programs: disk I/O bottlenecks, memory saturation, lock contention, etc.

Tudor
  • 61,523
  • 12
  • 102
  • 142
  • 1
    Well that's only true for CPU limited problems. I.e. I've code that queries servers in parallel which has a MUCH higher speedup than the number of cores (and also uses much more threads). But without some profiling where the bottleneck is, there's not much we can say.. – Voo Jan 01 '12 at 23:22
  • 1
    This answer is not applicable to the OP's situation. – quentin-starin Jan 02 '12 at 00:28
1

producer consumer with only one queue doesnt scale well. The queue becomes the bottleneck as you add more consumers or producers.

if you have multiprocessor architecture, you can use multiple processes to send emails. You can still use your producer consumer multithreaded version, but now it will be one foreach process; this will speed things up a bit (as Tudor explained) but the problem remains.

however, you might have, for the entire system, only one network manager or similar entity which sends the messages (say htttp messages) and one network card. Now the bottleneck could be this network manager. Id like to know more about the architecture of the system :)

Adrian
  • 5,603
  • 8
  • 53
  • 85
  • The time spent on P-C queue activity is insignificant compared with the I/O time and latency taken to set up TCP connections and send email. I would be amazed if queue operations ever became a bottleneck in this app. – Martin James Jan 02 '12 at 02:30
  • you cant guarantee that producers dont take "a lot" of time. I agree w you, it is possible the network manager is a bottleneck. However the point remains that a queue approach doesnt scale too well w lots of consumers and consumers – Adrian Jan 02 '12 at 03:27
  • Seems like the producer finishes very quickly compared to the consumers.... this does not seem to be the bottleneck, but a good point. – Rebecca Jan 03 '12 at 19:36
  • @Rebecca It's great to know more. It seems you cannot speed your app more than 4x because the consumers can't send the emails faster. Martin was right in saying that the bottleneck is the connections. Inspect the queue (its size). I expect it to keep growing... perhaps into out of memory land if you produce a lot. – Adrian Jan 03 '12 at 20:33
  • @Adrian So you think I might be running out of memory, and therefore the queue operations are taking longer? Would it then make sense to slow down the producer a bit if the queue gets too big? – Rebecca Jan 03 '12 at 20:36
  • @Rebecca no, when you run out of memory your app crashes. I think it's because of you can't consume the messages as fast as you produce them. So the problem is w the consumer. It just can't keep up. If this is indeed the problem, eventually your app will crash due to lack of memory. – Adrian Jan 03 '12 at 20:41
  • @Rebecca see if you can reuse the same connection instead of making new ones for each email – Adrian Jan 03 '12 at 20:57
0

I was in similar situation few months ago. Although there are many factors that we need in order to tell you which is causing the lower performance, you can try with a mirco instance of EC2 instance to try sending e-mails.

That turned out to be working well in my case, and it was a suitable solution as I was working on the web application.

Mr Programmer
  • 49
  • 2
  • 11
0

The task is neither CPU bound, nor IO bound. The task makes request to SES to send an email (with limited data or IO) and then waits. So, use the largest number of threads you can use for the available RAM.

-2

How commented, this a I/o problem, because, you need find a good number of jobs with infra / bandwidth size

Use a Queue Pattern,

Example:

1 - Enqueue a deliver email

2 - "N" Jobs dispatch the email