0

I'm migrating a service based integration platform from .Net Framework to .Net Core. The original versions of the integration platform have proven very successful and compared to replacing it with a 'off the shelf' integration solution, it has a far better ROI.

So after redeveloping the code, all tests has been working very well and have achieved higher levels of performance with a single IIS server that I could with 2 IIS servers with the original versions.

Except... If I go over ~3 message/sec with multiple clients, I start seeing duplicate GUID key errors when trying to save instrumentation data to my DB. All these errors are generated from the on-ramp service. The on-ramp places the message on a queue. The messages are then consumed by an off-ramp service and sent to the destination (for this load test the destination is a file folder).

Even though the off-ramp is also running on the same server as the on-ramp, we do not see any duplication errors generated by the off-ramp. I suspect this is due to the queue creating a linier process, so only one instance of the off-ramp is running at any time vs the on-ramp that has up to 4 clients firing concurrent messages at it's API.

Initially I thought the issue was caused by a static global variable class I had implemented, crossing process boundaries. But I would expect that the issue would be seen with the off-ramp as well, as the service architecture for both are virtually identical.

Summary of thoughts on issue:

  • If it is a pure coding issue, then errors would happen at low messaging rates.
  • The error would also be seen on the off-ramp if the GUID duplication was chance.
  • The on and off ramps are both running on the same server, but duplication only seen on the on ramp. IE on ramp not impacting the off ramp and visa versa.
  • Duplication has to be due to shared memory between concurrently running on-ramp instances, generated by multiple client scenario.

To try and resolve the issue I removed the static global variable class but I'm still seeing the duplication errors.

This issue was never observed in the original IIS implementation (after millions of message processed). I suspect the issue is with process isolation in the IIS hosted Kestrel .Net Core service host. From what I have read there is good isolation between different apps (based on IIS path) but not within the same app. So basically within the same IIS app pool. This could explain why .Net Core does not support multiple app running in the same IIS app pool.

If any one has a good idea how i can achieve process isolation between instances of the same app running in the same IIS app pool I would appreciate your thoughts/suggestions.

jps
  • 20,041
  • 15
  • 75
  • 79
Martin
  • 3
  • 3
  • If there isn't any diagram of the system components, sample code, or more, I suggest you add all kinds of application level logging as you can and then analyze from the log entries. It is simply impossible for anyone to comment on vague descriptions alone. – Lex Li Jan 05 '23 at 01:09
  • It's hard to make recommendations based on vague descriptions alone. Since in the original IIS implementation the problem did not arise. Have you checked the IIS configuration? As community member has said, you need to try to generate some logs to show what's going on. – TengFeiXie Jan 05 '23 at 09:12
  • My hope was that someone might be able to comment on the process isolation within an IIS App pool running Kestrel. At this point I believe the overall details of the solution and code are not the issue and will add confusion. For example 'can static classes be shared between different instances of a service running on Kestrel in the same IIS app pool?' If so, I know I need to eliminate all static classes and methods then re-test. – Martin Jan 05 '23 at 17:56
  • After running more tests and finding that although a duplicate KEY error is still being raised, the core message is not being impacted. I now suspect the issue might be with the generation of the key using the Guid.NewGuid() method. At higher rates of processing the method maybe returning the same GUID to multiple instances within the same app pool. Different App pools calling the NewGuid() method do not appear to be impacted. – Martin Jan 05 '23 at 20:48

1 Answers1

0

After running more tests I was able to resolve the issue. The problem was with the scope of the instrumentation variable. At low rates there was never a problem, but at high throughput, the same instrumentation object was being accessed by a second instance of the process.

The issue was difficult to track down due to the short lived nature of the integration services.

Thanks to anyone who reviewed the question.

Martin

Martin
  • 3
  • 3