Asynchronouos Socket Communication & Heap fragmentation

Question

I wrote a multithreaded Socket Server application which accepts over a 1000 concurrent connections. Recently we had application crash; after analyzing the dump files came to know app has crash due to heap corruption. I found the same issue discussed in following links.

.NET Does NOT Have Reliable Asynchronouos Socket Communication? http://support.microsoft.com/kb/947862

And also discussion suggest 3 solutions.

The network application should have an upper bound on the number of outstanding asynchronous IO that it posts.
Use Microsoft CCR
Use TPL

Due to the time factor, I thought to stick with #1, but I don't have a clear picture how to implement this. Can some one give a good starting point please?

And also has anyone used Async with TPL to solve this issue?

score 1 · Answer 1 · answered Nov 14 '13 at 13:39

You mean a better starting point than the blog posting that I linked to in the answer that you refer to?

The issue is this:

Memory and other per-operation resources that are used during an async write are often "in use" until the remote peer's TCP stack acks the data and the local stack can complete your async write operation to tell you that you can reuse your buffer.
The local peer has no control over this as it's all governed by the speed at which the remote peer reads data from its socket and the congestion on the link between the two peers.

Because of the above you need to have a hard limit on the amount of async writes that you have outstanding at any one time. You can track this by incrementing a counter just before you issue an async write and decrementing it in the completion handler.

What you do once you hit that limit is up to you. In the original article I favour a queue that data to be written is placed into. This queue can then be used as a source of data as write completions occur. Once the queue is empty you can send normally again. Of course this simply moves the problem - you still have a memory resource that's controlled by the remote peer (the queued data) but you don't also have other OS resources used too (non-paged pool, I/O page lock limit, etc).

You could simply stop your peer sending when you reach your limit - and now the API that you build over the async API needs to have a 'can't sent at the moment, try again later' return from a send which previously used to always "work".

If you're doing this I would also seriously look at avoiding the pinned memory issue by allocating a large block of buffers in one contiguous block and using them from the pool.

In fact, WCF uses BufferManager just for this reason. Never hurts to copy working solutions — Panagiotis Kanavos, Nov 14 '13 at 14:09

score 0 · Answer 2 · answered Nov 14 '13 at 10:06

First, that's a very old KB article. How are you sure you have that particular problem? Then, as Hans Passant answers in the SO question, if you write bad async code, it will bite you. If you don't take care of your resources (and memory buffers are resources), a concurrent program will face memory errors

It's very hard to write good concurrent code using raw Threads and TPL does make it easier but it won't fix the bugs you already have. In fact, unless you identify your current problems you are likely to transfer them to the version that uses TPL.

Without knowing the specific problem that caused your application to crash, I can only make some suggestions:

Use BufferManager to reuse memory buffers instead of allocating new ones.
Use a queue to store requests and process them asynchronously instead of starting a new thread for each request.

There are other techniques you can use as well, depending on the type of application you are building. Eg you could use TPL DataFlow to break processing in independent steps.

As for CCR, there is not much point in using it outside Robotics Studio. TPL contains most of the relevant functionality you need to write concurrent apps.

Asynchronouos Socket Communication & Heap fragmentation

2 Answers2