I have a .NET multi-threaded application. It receives and sends a lot of UDP packets through the network, and it makes a lot of calculations.
I open this app every day, and it functions during the whole business hours window. The number of concurrent threads (checked through the Task Manager) usually varies between 60 and 90. The CPU usage varies a lot, and it has some occasional spikes that make the CPU usage of the server reach 100%. But I would say the AVERAGE CPU usage of the app is low, less than 5%.
Sometimes, on some random days, usually when the number of received packets is higher than usual, the number of concurrent threads of this app raises to ~250 and the CPU usage of the server stays at 100% flat. The app is not using the whole 100% (because there are other apps running on this server), but it uses all CPU that is available, making the total utilization reach 100%.
The number of threads does NOT keep increasing, like if there was some kind of deadlock or memory leak. But it also doesn't decrease over time. The memory used by the process also doesn't increase over time, staying at the same levels of the days when the problem doesn't occur.
I believe there might be some bug on the source code that is triggering some kind of infinite loop or something like that.
Based on this post, I’ve tried using Microsoft’s Debug Analysis Tool v2 Update 3, but I’m facing some problems with it, which I describe below:
1) I followed all the instructions on the link above. I was able to create and activate the rule to detect the high CPU usage.
2) However, when the problem begins to happen, I see on the Task Manager a lot of new processes being created (with the same name as the process of my app), one at a time but sequentially, all of them with the status "Suspended". To be clear: these new Suspended processes are not being generated by my app, they are generated by the Debug Diagnostic Collection tool when it begins collecting the data for the dump files.
3) Looking at the DebugDiag 2 Collection tool main dialog, I then see the status of the rule as "Completed", even not having explicitly deactivated the rule and with the problem still occurring.
4) Then I use the DebugDiag 2 Analysis tool to analyze the dump files generated. I select "Performance Analyzers/PerfAnalysis", and the all the dump files, and start the analysis.
5) The result of the analysis follows below:
I don't think this System.ArgumentException has nothing to do with my app. I think the exception is thrown INSIDE the analysis tool, as it looks like when checking the stack trace. I don't know if, for instance, the fact that several processes with the same name are being generated during the data collection step is causing the analysis tool to try to add multiple records with the same key in a Dictionary.
The fact is that this issue is preventing me from figuring out the cause of the problem. I'm aware that there are other analysis tools like DotTrace and ANTS, but I would really prefer to use a free tool before migrating to a commercial one. I've even contacted the developer of CodeTrack, which is free and looks like a fine tool, but the tips and recommendations that he gave me aren't simple to follow on my side, because:
- My app is running on production servers.
- It's not simple to simulate the production environment in a test machine, since I'm using real-time market data to feed the app.
- Before anyone suggests using Visual Studio's own profiling tool, the production servers don't have VS installed (and it's not my intention to install it on them).
So, I guess my real question here is: does anyone know what am I doing wrong (if so) when using MS Debug Diagnostic Tool? Is the issue I'm facing really a bug? Is it supposed to create several Suspended processes during data collection? How can I fix this and make it work properly so I can use it to investigate my problem?