2

Yesterday i requested additional budget to buy licenses for NSB (36-80 cores) after some very promising testing with the distributor.

It should be mentioned that we are currently using the distributor to solve a plumbing problem and not yet for real buisness events, but that is to come later.

Today my very skilled colleague started to use the bus as well, only his performance demands are MUCH higher than mine, due to the nature of his project. So his test requires a lot more than then average 30-50 msg pr second that the distributor is currently giving me.

Whatever we have tried to do:

  1. More workers.
  2. More threads on worker(s).
  3. More threads on the distributor.
  4. Enable/Disable DTC
  5. All the above in a bare bone proof of concept setup with a message carrying only an id. We were unable to get anymore than around 100 msg distributed pr. second to the workers.

This is exceptionally bad as I have allready applied for more budget money and if we cannot get better performance very quick, we have to abandon the project and we will be up for some serious trouble :S.

Question:

Are there a limitation on Developer Licenses that we are currently using which is causing this limitation OR is there a serious performance issue with MSMQ as described for example here: http://ayende.com/blog/4251/what-am-i-missing-msmq-perf-issue.

Ive been reading a lot on the nservicebus own site about the license but no where is there a clear description of limitations to the developer licenses.

Hope someone can help me out :).

[UPDATE]

I went back to basics and tried to reproduced the problem with NSB's own code sample ScaleOut, by simply sending 10000 msg's and see how the workers/distributors would react. So I started the distributor and the 2 workers (workers had 100 threads each) and guess what the problem re-emerged.

  1. Fortunately this indicates that we have not configured our Distributor/Worker setup completely wrong.
  2. Unfortunately this means we still have a performance issue with the Distributor.

Then I wondered if this could really be the case and started running some simple number of threads, tweaking tests with the Distributor and Workers. After some attempts I got the throughput up to as much as 400-500 msg pr sec in the sample. Here is what I discovered:

Observations/Solution:

  1. The Distributor needs more threads than just the 1, but not too many. Right now im running 2 threads pr. worker I startup.
  2. The Workers will typically have the same performance if im running 20 or 100 threads. So instead of turning up the number of threads, I turned up the number of workers, which did the trick.
  3. If the number of threads on either the distributor or the workers are too high they appear to run into MSMQ transaction battles, where they block each other and therefore makes the system clog up in bursts. I can easily reproduce the clog ups with the ScaleOut sample and my own code, however the TX battle is just a wild guess based on articles I have read, I have no proff that this is what is happening.

Followup Questions:

  1. What to do now? Should we replace MSMQ with something else or is this issue something internal to NSB which may be optimized/fixed in later versions?
  2. Is this the intended way for the Distributor to work, meaning our only solution is to fire up more workers?
  3. Will multiple workers on the same endpoint, but a different machine than the distributor is running on, not cause a competing consumer situation which might again cause MSMQ TX battles, between the workers?
  4. There is one important difference between the sample and our own code, we have disabled the Raven subscription storage and are running purely on MSMQ's, but as far as I am aware the distributor do not use Raven db for storage. Am I wrong and could this be a place to gain some performance?

Im doing some distributed tests right now to see if theres an issue starting multiple workers on the same machine, but not the same machine as the Distributor. My hope is that this is possible with out having to setup individual queues for each worker, as we have allready ordered the extra servers for the workers and dont have budget for more.

So far im a bit disappointed that I cannot simply turn up the number of threads on a worker and start with just the one worker and later scale out to more machines, each with 1 worker. Now im forced to have multiple workers on one machine :/.

If theres any small point about the Distributor/workers im missing, please share, as this is driving me crazy :/.

[UPDATE 2]

If i run the ScaleOut sample outside visual studio with NServiceBus.Integration NServiceBus.Distributor/Worker and just 1 worker I can get a throughput of 4-500 msg/sec.

This is great but it do not explain what I have done wrong in our own setup where we are self hosting. Take a look at our configurations and tell me if theres something fishy:

Distributor:

        var queuePrefix = ConvertFriendlyNameTo.QueueName(AppDomain.CurrentDomain.FriendlyName);

        return NServiceBus.Configure.With()
            .DefineEndpointName(queuePrefix)
            .Log4Net(ObjectFactory.GetInstance<IServiceBusLog>().Build())
            .StructureMapBuilder()
            .JsonSerializer()
            .AsMasterNode()
            .RunDistributorWithNoWorkerOnItsEndpoint()
            .MsmqTransport()
            .IsTransactional(true)
            .DisableTimeoutManager()
            .DisableSecondLevelRetries()
            .UnicastBus()
            .CreateBus()
            .Start(() => NServiceBus.Configure.Instance.ForInstallationOn<NServiceBus.Installation.Environments.Windows>().Install());

Worker:

        var queuePrefix = ConvertFriendlyNameTo.QueueName(AppDomain.CurrentDomain.FriendlyName);

        return NServiceBus.Configure.With()
            .DefineEndpointName(queuePrefix)
            .Log4Net(ObjectFactory.GetInstance<IServiceBusLog>().Build())
            .StructureMapBuilder()
            .JsonSerializer()
            .EnlistWithDistributor()
            .MsmqTransport()
            .IsTransactional(true)
            .DisableTimeoutManager()
            .DisableSecondLevelRetries()
            .UnicastBus()
            .CreateBus()
            .Start(() => NServiceBus.Configure.Instance.ForInstallationOn<NServiceBus.Installation.Environments.Windows>().Install());

Is there anything we are doing wrong here which might cause the performance difference?

Kind regards.

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Christian Mikkelsen
  • 1,661
  • 2
  • 19
  • 44
  • Something must be off, you can easily get 1200-1300 msg/s on a single box without the distributor. What version is this? – Andreas Öhlund Mar 23 '13 at 05:58
  • If we run a sendonly() server/publisher without anything else, locally, we can push 10000 msg to a queue within 5-6 seconds which is beautifull. The problem arises when we use the distributor and the distributor is essentiel to our solution as scaling is the whole point. We are running with the latest version from nuget, fetched yesterday, 3.3.5 i think. – Christian Mikkelsen Mar 23 '13 at 06:46
  • It could look like a msmq transaction battle or that theres some 2 workers only limitation on the developer license. – Christian Mikkelsen Mar 23 '13 at 06:49
  • Hi Andreas, I have updated my question with some observations. If your have the time to spare and could give me your 5 cents on what im seeing here I would be gratefull and quite possibly a new customer :). – Christian Mikkelsen Mar 23 '13 at 15:52
  • 1
    v4 has a much better threading strategy, 10 threads is where I get the best perf on my machine. The distributor is meant to work with workers on separate machines since it only adds overhead if you run local workers. – Andreas Öhlund Mar 23 '13 at 19:25
  • 10 worker threads or distributor threads? Any eta on v4? – martinlund Mar 23 '13 at 19:41
  • So now i've lost it. I ran the scale out example outside VS2012 from the commandline with Integration as parameter and suddenly 1 worker can consume 4-500 msg pr. sec. Have I missed some important point with the profiles/installers and licenses? Both worker and distributor was configured with 4 threads. – Christian Mikkelsen Mar 23 '13 at 21:10
  • I think maybe we been owned by running the processes inside VS2012... will do some more testing and write back later. – Christian Mikkelsen Mar 23 '13 at 22:01

0 Answers0