.NET 2.0 ThreadPool Thread Stack Increase Causing Out of Memory Exception (Commit Limit Reached)

Question

OS: Windows 7 Embedded

RAM: 1 GB

Paging File Size: 500 MB

Remaining Disk Space: ~1 GB

.NET: 2.0

I am working on a .NET 2.0 Winforms application written in C# that is running on Windows 7 Embedded, where the system only has 1 GB of RAM, and fairly limited free disk space (~1 GB). We are migrating to Windows 7 Embedded from Windows XP Embedded, and while our program works well on Windows XP, it has failed several times due to an out of memory exception on Windows 7. We have added a 500 MB paging file since the failures (the default for Windows 7 Embedded seems to be a paging file size of 0MB). However, because we have narrowed down the out of memory exception to be caused by an increase in the program's committed memory, it is possible the system commit limit will be reached over long enough periods of time, even with the paging file. We cannot quickly migrate our hardware to something more appropriate, and must find a software solution that can fix the issue before the next release.

Using SysInternals VMMap tool, we can see that the number of thread stacks in the program's virtual memory slowly increases over time, eventually causing a failure when the program's committed memory results in the system exceeding its commit limit. The cause for the thread stack increase has been isolated to the .NET 2.0 ThreadPool which creates a net positive number of threads over time for some reason. We think that this is because of an overuse of the System.Timers.Timer class in our code, with each instance running its timer elapsed event handler on the ThreadPool, though it is still not clear why the ThreadPool keeps so many threads around even when it is assumed they are not always used by the timer callbacks. Some of these event handlers process information for longer than would be ideal for a ThreadPool thread, with the worst offenders even calling Thread.Sleep().

We have thought up several possible solutions for this problem, including putting a limit on the number of ThreadPool worker threads, swapping the longer running timer callbacks for threads, and migrating to a higher version of .NET. The last solution assumes that optimizations to the ThreadPool manager have been made over iterations of .NET, which may help to alleviate the problem. Are there any other obvious (or non-obvious) solutions that we have missed?

Edit: On further inspection, the ThreadPool threads that get generated and stick around have the following call stack:

ntdll!KiFastSystemCallRet 
ntdll!ZwWaitForSingleObject+c 
KERNELBASE!WaitForSingleObjectEx+98 
kernel32!WaitForSingleObjectExImplementation+75 
mscorwks!PEImage::LoadImage+1af 
mscorwks!CLREvent::WaitEx+117 
mscorwks!CLREvent::Wait+17 
mscorwks!ThreadpoolMgr::SafeWait+73 
mscorwks!ThreadpoolMgr::WorkerThreadStart+11c 
mscorwks!Thread::intermediateThreadProc+49 
kernel32!BaseThreadInitThunk+e 
ntdll!__RtlUserThreadStart+70 
ntdll!_RtlUserThreadStart+1b

As far as obvious goes I would check that each of the timers is being disposed of when they are no longer in use. I would also get a memory dump and see if all the threads that are left behind can draw their creation back to any single timer or location that makes timers. This could give you a better idea of what call back might be holding onto a reference to something and causing the threads not to be garbage collected. — Max Young, Jun 08 '18 at 18:44
I should mention that the timer callbacks correspond to tasks that must be continuously carried out while the program is running (i.e. we cannot dispose the timers between elapsed event handler calls, and the timers must exist for the duration the program is run). The program is meant to run in this way for multi-month periods. We know that this type of polling in our code is not ideal, but it is the architecture we are stuck with until we have time to migrate to a more event driven architecture. — MiniWalrus, Jun 08 '18 at 18:54
Sorry, I suspected as much but wanted to confirm that. I don't suppose any of the timers spawn timers? — Max Young, Jun 08 '18 at 18:58
No worries. Thanks for the input. To the best of my knowledge, none of the timers spawn timers. — MiniWalrus, Jun 08 '18 at 20:40
Only other thing I can think of is that `ThreadPool` is static to the .NET Framework so everything in your application will be getting threads from it. Maybe something else in your application is requesting the threads as opposed to the timers.I know each `Timer` has a thread from the `ThreadPool` under the hood but I have never observed behavior where those threads pile up in .NET 4.5 and later. The closest I have ever seen is when our call back was to a method on the main form of application so the `Timers` we were spawning were never disposed. — Max Young, Jun 08 '18 at 21:46
I used dotMemory at the time to profile the application and see why the `Timer`s were being retained and that is when I saw that each `Timer` had a thread from the thread pool under the hood. Also that each time we opened a specific form we never disposed of the timer and it never got collected because it had a call back to a method on the main form. Maybe each time you elapse you could try unhooking your elapsed event and rehooking the event to see if it allows for .NET to cleanup the threads that are being orphaned? — Max Young, Jun 08 '18 at 21:48
Also after looking at the documentation for .NET 2.0 `Timers` I find this very interesting `The Elapsed event is raised on a ThreadPool thread. If processing of the Elapsed event lasts longer than Interval, the event might be raised again on another ThreadPool thread. Thus, the event handler should be reentrant.` This could mean your threads could pile up if they take longer and longer to complete their work since the same timer can get multiple threads. https://msdn.microsoft.com/en-us/library/system.timers.timer(v=vs.80).aspx — Max Young, Jun 08 '18 at 21:53
To your comment about other things requesting ThreadPool threads, we do not believe this to be the case. From both the msdn docs on the ThreadPool class and analysis of thread stacks from memory dumps, I've seen four reasons that the ThreadPool timers is used: — MiniWalrus, Jun 08 '18 at 23:24
1) A System.Threading.Timer or System.Timers.Timer needs to be run 2) A callback from an asynchronous I/O procedure needs to be run 3) The ThreadPool.RegisterWaitForSingleObject method is used to queue delegates to the threadpool and signal them for execution via a synchronization object 4) The ThreadPool.QueueUserWorkItem method is used to queue a delegate on the threadpool until a thread is available — MiniWalrus, Jun 08 '18 at 23:25
Unhooking and rehooking up the elapsed event sounds like a possible solution. I will look more into that, thanks! — MiniWalrus, Jun 08 '18 at 23:27
And to your last point about reentrant event handlers, we have ensured that all our timer callbacks stop their timer at the beginning of execution, and start it again at the end, with the business logic surrounded by a try-catch. It is still possible that during the callback, we get a context switch before or during the call to the timer's stop method, with the identical timer callback getting called again before the timer is stopped, but this is assumed to be exceptionally rare. — MiniWalrus, Jun 08 '18 at 23:43
.NET 2.0 still had a somewhat reasonable upper limit to the maximum number of threadpool threads, but then again it isn't that likely that you actually use it. You need to call ThreadPool.SetMaxThreads() in your Main() method. Say 10. Beware that you are hiding a pretty ugly problem that can byte you a different way now, delays might be quite a bit longer now. — Hans Passant, Jun 11 '18 at 17:24
Putting a hard limit on the number of threadpool threads is a possible solution, and through some preliminary tests, it seems to fix the thread stack increase issue. But like you said, we would be worried about threadpool deadlocks or delays caused by excessive timer callback queuing. — MiniWalrus, Jun 12 '18 at 21:04

.NET 2.0 ThreadPool Thread Stack Increase Causing Out of Memory Exception (Commit Limit Reached)

0 Answers0