Performance of ReceiveAsync vs. BeginReceive

Question

I'm currently programming a client application and I'm wondering whether I should use the Socket class' ReceiveAsync or BeginReceive method. I have been using the latter so far, however, I found that it seems to stress the CPU quite a bit. Here is what my receive loop basically looks like:

private void socket_ReceiveCallback(IAsyncResult result_)
{
    // does nothing else at the moment
    socket.EndReceive(result_);
    byte[] buffer = (byte[])result_.AsyncState;

    // receive new packet
    byte[] newBuffer = new byte[1024];
    socket.BeginReceive(newBuffer, 0, newBuffer.Length, SocketFlags.None, 
                        socket_ReceiveFallback, newBuffer);
}

Now I've been wondering if I am doing something wrong here, since other applications that communicate hardly stress the CPU at all. And also I'm wondering if I would be better off with using SocketAsyncEventArgs and ReceiveAsync.

So here are my questions:

Why is my loop stressing the CPU so much? Should I use SocketAsyncEventArgs and ReceiveAsync instead of BeginReceive?

i bet it is becuase you are calling your method in a loop where there is no blocking so you are maxing out one of your cores - use the ManualResetEvent class to do manual blocking if needed — markmnl, May 14 '12 at 14:56
Is "socket_ReceiveFallback" indeed a different method in you implementation? — markmnl, May 14 '12 at 14:59

Fred · Answer 1 · 2021-10-20T08:18:09.380

BeginReceive and EndReceive are remnants of the old legacy asynchronous pattern that were used before the introduction of the modern async and await keywords in C# 5.

So you should prefer to use ReceiveAsync over BeginReceive and EndReceive for asynchronous programming.

For really high performance scenarios you should use SocketAsyncEventArgs. This was designed for high performance and is used by the Kestrel web server.

From the remarks section for the SocketAsyncEventArgs documentation

The SocketAsyncEventArgs class is part of a set of enhancements to the System.Net.Sockets.Socket class that provide an alternative asynchronous pattern that can be used by specialized high-performance socket applications. This class was specifically designed for network server applications that require high performance. An application can use the enhanced asynchronous pattern exclusively or only in targeted hot areas (for example, when receiving large amounts of data).

The main feature of these enhancements is the avoidance of the repeated allocation and synchronization of objects during high-volume asynchronous socket I/O. The Begin/End design pattern currently implemented by the System.Net.Sockets.Socket class requires a System.IAsyncResult object be allocated for each asynchronous socket operation.

In the new System.Net.Sockets.Socket class enhancements, asynchronous socket operations are described by reusable SocketAsyncEventArgs objects allocated and maintained by the application. High-performance socket applications know best the amount of overlapped socket operations that must be sustained. The application can create as many of the SocketAsyncEventArgs objects that it needs. For example, if a server application needs to have 15 socket accept operations outstanding at all times to support incoming client connection rates, it can allocate 15 reusable SocketAsyncEventArgs objects for that purpose.

I investigated this recently because of this answer. It seems that SocketAsyncEventArgs came about for Silverlight. As Silverlight end of life is 2021 I would be cautious about adopting this now. — paulecoyote, Jul 22 '20 at 22:45
@paulecoyote I don't think it is going anyway. It is still available in .NET 5, and I think it is in use by Kestrel. — Fred, Jan 21 '21 at 12:50

score 7 · Accepted Answer · answered Mar 28 '12 at 20:27

7

I have been benchmarking synchronous vs. asynchronous socket on on a localhost loopback connection. My results were that the asynchronous version was about 30% slower. That was surprising to me considering that async IO is all the rage now. It didn't matter how many threads I used. I could use 128 threads and still synchronous IO was faster.

The reason for that is, I believe, that async IO requires more allocations and more kernel mode transitions.

So you could just switch to synchronous IO, if you don't expect hundreds of simultaneous connections.

answered Mar 28 '12 at 20:27

usr

168,620
35
240
369

I am only connecting to a single server. So I could basically just use the regular Receive and Send methods? If I want it to be asynchronous, I could create the whole class in a different thread which would probably not end up being so much slower. – haiyyu Mar 28 '12 at 20:33
1

I would definitely benchmark this option. My guess is the synchronous version will be faster. And more maintainable! – usr Mar 28 '12 at 20:35
I will do it and post the results afterwards. Also, I meant to say "I could create the whole object (not class) in a different thread." – haiyyu Mar 28 '12 at 20:40
asynch will be slightly slower processing one client - but if you have more especially many more - say thousands dealing with one clients request at a time would be drastically slower pver the Internet as you wait for their data to come in. – markmnl May 14 '12 at 14:58
1

How many concurrent connections did you try with? The async methods is mainly for server applications. – jgauffin Jul 01 '13 at 13:53
@usr: This is correct. Async causes context switching which adds 1000s of extra cpu cycles. High bandwidth requirements dictate you should use sync mode and fallback to async when data isn't immediately available. The best implementation we found is a [two phase waiting](https://msdn.microsoft.com/en-us/library/ee722116(v=vs.110).aspx) where we use a SpinWait object while checking on data availability; and do an async call when spinwait indicates it's about to do a context switch (check property NextSpinWillYield). – Max Oct 22 '16 at 06:08
@John there's also the option to query `int DataAvailable` to read that much data synchronously, then issue an async read (possibly for size 1). I have never seen a case where this technique was required for performance but I'm sure such places exist. – usr Oct 22 '16 at 11:41
@usr: correct, that was I was eluding to by using "sync mode" when data is available. – Max Nov 28 '16 at 19:07
The answer was written 5 years ago. Is this still the case now that Tasks got massively improved (reduced memory size and other things) in .NET4.5 ? – Riki Apr 16 '17 at 04:41
@Felheart good question. Async IO performs strictly more work than sync IO (except under extreme load e.g. >90% CPU usage which no sane production app runs at). So it should still be slower. Probably, less slower now at the managed level. I don't think anything was improved to save kernel mode work. An async IO calls the kernel 2 times. A sync one calls it once. – usr Apr 17 '17 at 11:26

score 1 · Answer 3 · answered Dec 04 '19 at 12:25

I did a comparative for max-load, the results in GBs (giga-bytes per second):

ReceiveAsync: ~1,2GBs
BeginReceive: ~1,1GBs
Receive (in a thread loop): ~1,4GBs

Notes:

All results was made using loopback address (localhost) and using a thread for the send socket
8192 bytes for buffer size

For a big-load transfer I would suggest using the Receive in a thread but for better CPU performance with various connections would use ReceiveAsync or BeginReceive.

score 1 · Answer 4 · answered Mar 28 '12 at 20:29

1

do answer this you'd have to profile your application. What I wonder is

why I see no EndReceive
why you don't use the received buffer at all and
why you allocate new buffers time and time again - this is the only opperation here that should take any resources (CPU/memory)

Have a look at this: http://msdn.microsoft.com/de-de/library/dxkwh6zw.aspx

answered Mar 28 '12 at 20:29

Random Dev

51,810
9
92
119

1) Sorry, forgot to add that to the code. It is there in the program. I have edited my post. 2) I am planning on using it. However, even without using it, the server stresses the CPU a lot. 3) So would it be smarter to create a buffer pool? I will do that and test if whether changes anything! Thanks for your help. – haiyyu Mar 28 '12 at 20:35
my best guess at this point is that since you transfer only small blocks (1kB) and you do this without delay - there is not much waiting for the hardware to do it's work (even more true if you do this on the same machine), so the system is busy creating your asynchronous calls. But I really recommend finding a good profiler (most have trials) or even using the build in (I think VS prof or greater) to find the weak point – Random Dev Mar 28 '12 at 20:43

Performance of ReceiveAsync vs. BeginReceive

4 Answers4