0

I'm doing some stress tests on a saga that uses 2 timeouts. During the test about 21K saga's get created. So that would mean 42K timeouts, but I notice that the timeoutsdispatcher queue of the saga is getting flooded with 100's of thousands of messages until it crashes because the MSMQ storage limit is hit.

I'm seeing this behavior since I switched the persistence mechanism from RavenDB to SQL Server.

Does anyone have an idea what could be wrong?

Transport: MSMQ
Persistence: NHibernate Packages used:

NHibernate version 4.0.4.4000  
NServiceBus version 5.2.14  
NServiceBus.Host version 6.0.0  
NServiceBus.Log4Net version 1.0.0  
NServiceBus.NHibernate version 6.2.7  

Test setup:
* endpoint 1 is sending 22000 messages to endpoint 2.
* endpoint 2 hosts a saga that is started by that message.
* each saga publishes an event and then requests 2 timeouts: 1 at 4 minutes, 1 at 10 minutes.

Observed behavior:
* endpoint 1 sends the 22K messages in under a minute.
* endpoint 2 (the saga) processes 5 to 10 messages per second.
* after 4 minutes the first timeouts are fired, while endpoint 2 is still processing messages from its queue and thus is still creating new saga instances.
* from that moment on, the timeoutsdispatcher queue of the saga endpoint is getting filled with messages.
* after 10 minutes or so, the timeoutsdispatcher queue already contains over 170K messages and is still filling up.
* That continues until endpoint 2 crashes because the MSMQ storage limit is hit, or all messages from the input queue are processed. If the latter occurs first, the timeoutsdispatcher queue message count starts to decrease until it eventually reaches 0.

Marc Selis
  • 833
  • 12
  • 17

1 Answers1

3

Did you perform the same stress test with RavenDB? And is SQL Server on a machine that's more-or-less equally powerful, with fast drives?

Update

Some checks for your saga

  • Is the [Unique] attribute used and is it used properly? In other words, do you use unique ids for every incoming message? So that every incoming message that is spawning 2 timeouts, will create a unique saga instance? If every incoming message is accessing the same Saga, this would be a great case for extremely limiting throughput. Imagine the Saga instance was created already once, else the explanation would become to complex. So Message1 comes in, tries to find the row in the database, finds and locks it. The second message comes in at the same time, finds the row but it's locked. It will go into retry. Message3 up until Message100 come in (if concurrency is set to 100) and all try to do the same thing, immediately failing. You can see this will limit throughput for a while :)
  • Are the correct indexes on your Saga table(s) and Timeout tables?
  • What is your maximum concurrency level set to?

Based on the number of message, you say you send 22k messages, resulting in 44k timeout messages. Image all of these timeouts are in MSMQ. Imagine messages are really, really small, like 1Kb. Header information added by NServiceBus might take up 2Kb. That's 44.000 times 3Kb is roughly 135 megabytes. So there's no way that can fill up a default MSMQ installation which has a quota of 1GB by default.

This probably means your deadletter queue is filled up completely. Find more information on MSMQ connectionstrings and set the appropriate connectionstring. For example

<connectionStrings>
  <add name="NServiceBus/Transport"
    connectionString="deadLetter=false;journal=false;"/>
</connectionStrings>

Messages with TimeToBeReceived attribute set (link) end up in deadletter queue. Also purging queues will make all messages go to deadletter queue. Unless you set the proper connectionstring.

Dennis van der Stelt
  • 2,203
  • 16
  • 22
  • Yes I'm doing the stress test on the same machine as I did it first with RavenDb: my local laptop with embedded SSD. There is nothing wrong with the message processing itself. The saga endpoint is running fine, processing 5 to 10 messages per second. I'm just seeing that the timeoutspispatcher queue of my saga is getting flooded with messages, the moment the first of the 2 timeouts of my created saga's fire. – Marc Selis Apr 26 '16 at 12:57
  • I added a description of my test setup and observations. – Marc Selis Apr 26 '16 at 13:22
  • @MarcSelis Need any more information or are you okay with the current answer? – Dennis van der Stelt May 09 '16 at 07:11
  • As stated in my observations it is definitely the timeoutsdispatcher queue and not the dead-letter queue that is getting flooded with messages. It seems like the timeout mechanism is creating multiple messages for the same timeout over and over again, until all messages from the input queue are processed. – Marc Selis May 09 '16 at 14:32