3

On a Windows7 quadcore with hyperthreading enabled, I have an hundred of threads comunicating via BlockingCollection<T> (all initialized with the default constructor, thus using ConcurrentQueue<T> internally).

All threads receives 10-100 messages each seconds, except one that receives just 4-20 messages each day.

My issue is related to this last consumer: most of time, it's blocked waiting for new messages, but when a message is ready to be consumed it should be handled it as soon as possible, possibly instantly.

The problem is that when a message is added to the BlockingCollection dedicated to this consumer, it is received after a few seconds (the Take() returns from 3 to 12 seconds after the message was enqueued).

I guess the problem is related to how windows schedule this thread.

I've tried to increase the ThreadPriority of this consumer without any improvements. Then I've tried to set its processor affinity to a dedicated core, and I've changed all other threads's affinity to use the others cores, but still without any improvements.

How can I address this issue? What's the actual problem?

This is the thread loop (commands is the BlockingCollection):

while (!commands.IsCompleted)
{
    _log.Debug("Waiting for command.");

    command = commands.Take();

    _log.DebugFormat("Received: {0}", command);

    command.ApplyTo(target);

    _log.InfoFormat("Dispatched: {0}", command);
}

Should I directly use a ConcurrentQueue, sleeping for something like 50 milliseconds (that would be an acceptable delay) when no messages are pending?

Note that
No CPU use more that 50%.

As for the delay: logs show that (most of times) a few seconds passes between "Dispatched: ..." and "Waiting for command." (aka on commands.IsCompleted) and a few others between "Waiting for command." and "Received: ..." (aka on commands.Take())

All "sane" threads are I/O bound but they are cooperating: believe me I can't change their design. However they work pretty well: the only missbehaving one is the low load thread that does a different kind of work. I know that thread priority are evil but I didn't find a better solution.

A test that (almost) replicate the issue
Here is a test that almost replicate the issue, on the server. "Almost" because the test stress the CPUs (all go to 100%), while in the actual application all the CPUs are under 50%:

public class ThreadTest
{
    class TimedInt
    {
        public int Value;

        public DateTime Time;
    }
    [Test]
    public void BlockingCollection_consumedWithLowLoad_delaySomeTimes()
    {
        // arrange:
        int toComplete = 0;
        BlockingCollection<KeyValuePair<DateTime, TimedInt>> results = new BlockingCollection<KeyValuePair<DateTime, TimedInt>>();
        BlockingCollection<TimedInt> queue = new BlockingCollection<TimedInt>();
        Action<int> producer = a =>
        {
            int i = 1;
            int x = Convert.ToInt32(Math.Pow(a, 7));
            while (i < 200000000)
            {
                if (i % x == 0)
                {
                    queue.Add(new TimedInt { Time = DateTime.Now, Value = i });
                    Thread.SpinWait(100); // just to simulate a bit of actual work here
                    queue.Add(new TimedInt { Time = DateTime.Now, Value = i + 1 });
                }
                i++;
            }
            Interlocked.Decrement(ref toComplete);
        };
        Action consumer = () =>
        {
            Thread.CurrentThread.Priority = ThreadPriority.Highest; // Note that the consumer has an higher priority
            Thread.CurrentThread.Name = "Consumer";
            while (toComplete > 0)
            {
                TimedInt v;
                if (queue.TryTake(out v, 1000))
                {
                    DateTime now = DateTime.Now;
                    results.Add(new KeyValuePair<DateTime, TimedInt>(now, v));
                }
            }
        };

        // act:
        List<Thread> threads = new List<Thread>();
        threads.Add(new Thread(new ThreadStart(consumer)));
        for (int i = 0; i < 200; i++)
        {
            var t = new Thread(new ThreadStart(() => producer(7 + (i % 3))));
            t.Name = "Producer " + i.ToString();
            threads.Add(t);
            toComplete++;
        }
        threads.ForEach(t => t.Start());
        threads.ForEach(t => t.Join());

        // assert:
        Assert.AreEqual(0, results.Where(kvp => (kvp.Key - kvp.Value.Time).TotalMilliseconds > 1000).Count());
    }
}

Any ideas?

Giacomo Tesio
  • 7,144
  • 3
  • 31
  • 48
  • Well, first, is you box actually overloaded, ie. CPU use 100% on all cores? Seems unlikely, but I have to ask it:) Then, 3 to 12 seconds is just a ridiculously long delay for a BlockingCollection. You absolutely convinced that this thread is stuck on the Take() and not blocked/busy somewhere else for all that time? – Martin James Feb 05 '14 at 12:26
  • Your design is very wrong. Do not mess with thread priorities - it does something else than the name would suggest. Why do you have so many threads? CPU bound operations will not be helped by more than the amount of physical cores, and I/O bound operations on threads only mean low performance and other issues. Instead, you should use asynchronous I/O operations (which do not actually create a new thread for each operation) and keep your thread count low. Also, how do you know your collection is actually blocking, rather than spinning? – Luaan Feb 05 '14 at 12:27
  • 1
    Pretty vague. Windows puts code that doesn't run for long periods in the dog-house. Its memory pages are written to the paging file when another process needs RAM. This tends to produce a storm of page faults when the code is re-activated and needs to be reloaded into RAM. Seeing this take 20 seconds is rather excessive but technically possible. Thread priority doesn't change this, it is a hard disk issue. – Hans Passant Feb 05 '14 at 12:27
  • 1
    If you think it's because windows schedules the thread with extremely low priority because it's waiting 99.9% of the time why not TryTake with a 50ms timeout and just loop until you get something. – ta.speot.is Feb 05 '14 at 12:27
  • @ta.speot.is Windows, (or any other preemptive multitasker that is sane), does not take such silly actions. If anything, a thread that has just become ready on a signal receives a priority boost. – Martin James Feb 05 '14 at 12:30
  • @MartinJames no CPU use more that 50%. As for the delay: logs show that (most of times) a few seconds passes between "Dispatched: ..." and "Waiting for command." (aka on `commands.IsCompleted`) and a few others between "Waiting for command." and "Received: ..." (aka on `commands.Take()`) – Giacomo Tesio Feb 05 '14 at 12:34
  • @HansPassant -yyyess...ish. Maybe.... 3-12 seconds is a huge amount of time. – Martin James Feb 05 '14 at 12:34
  • @Luaan requirements are a bit more complex that you might expect by this question. All other threads are I/O bound but they are cooperating: believe me I can't change their design. However they work pretty well: the only missbehaving one is the low load thread that does a different kind of work. I know that [thread priority are evil](http://www.codinghorror.com/blog/2006/08/thread-priorities-are-evil.html) but I didn't find a better solution. – Giacomo Tesio Feb 05 '14 at 12:41
  • @GiacomoTesio - that is very odd. There is hardly any code between "Dispatched: ..." and "Waiting for command." - a loop back and an atomic check on the BC state. That should not take seconds, even if the code/stack has indeed been paged out. Like Hans suggests, how busy is your disk with the page file on it? – Martin James Feb 05 '14 at 12:41
  • Ah... there is those logging calls. How does your logger work? Please don't say it first acquires a mutex and then synchronously opens a network file on a server in Novaya Zemlya.. – Martin James Feb 05 '14 at 12:46
  • Can you reproduce the issue in the debugger? Check the OS counters for page file usage. To me it doesn't sound like paging because these days paging is rare with lots of memory. Without memory pressure, pages are not evicted from memory. Are you having memory pressure? What OS is this? Is there lots of cached IO? If you're not seeing hard page faults, it is not paging related. – usr Feb 05 '14 at 12:49
  • @usr - seems ulikely to Hans and I too, but we're struggling to explain such a monumental latency on a box that is not ridiculously overloaded. – Martin James Feb 05 '14 at 12:53
  • I'm even skeptical that this phenomenon even exists. Scheduling latency in the seconds on a one-socket box is unheard of. Paging unlikely too. Try inserting a `Stopwatch` measurement to make the measurement independent of the logger. Remove the logging from the loop or only log the collected results every 10th trip. – usr Feb 05 '14 at 12:55
  • @usr Indeed. I'm betting on a .NET log class that is, err.. 'less than optimal'. – Martin James Feb 05 '14 at 13:02
  • @MartinJames I'm using log4net for logs. – Giacomo Tesio Feb 05 '14 at 13:37
  • @usr the server has 8 GB of DDR3 ram, but TaskManager shows that only 1.25 GB are used. – Giacomo Tesio Feb 05 '14 at 13:40
  • @ta.speot.is that's an approach. But I'd like to avoid polling, if possible. And I'd like to understand what's happening under the hood. – Giacomo Tesio Feb 05 '14 at 13:48
  • @GiacomoTesio - sure, but I think that he's suggesting it as a debug approach only. – Martin James Feb 05 '14 at 13:50

0 Answers0