3

I have the following code that I populate users from a source, for the sake of example it is as below. what I want to do is to consume BlockingCollection with multiple consumers.

Is below the right way to do that? Also what would be the best number of threads ? ok this would depend on hardware, memory etc. Or how can i do it in a better way?

Also would below implementation ensure that i will process everything in the collection until it is empty?

    class Program
    {
        public static readonly BlockingCollection<User> users = new BlockingCollection<User>();

        static void Main(string[] args)
        {
            for (int i = 0; i < 100000; i++)
            {
                var u = new User {Id = i, Name = "user " + i};
                users.Add(u);
            }

            Run(); 
        }

        static void Run()
        {
            for (int i = 0; i < 100; i++)
            {
                Task.Factory.StartNew(Process, TaskCreationOptions.LongRunning);
            }
        }

        static void Process()
        {
            foreach (var user in users.GetConsumingEnumerable())
            {
                Console.WriteLine(user.Id);
            }
        }
    }

    public class User
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }
DarthVader
  • 52,984
  • 76
  • 209
  • 300

1 Answers1

7

A few small things

  1. You never called CompleteAdding, by not doing that your consuming foreach loops will never complete and hang forever. Fix that by doing users.CompleteAdding() after the initial for loop.
  2. You never wait for the work to finish, Run() will spin up your 100 threads (which likely WAY too much unless your real process involves a lot of waiting for uncontested resources). Because Tasks are not foreground threads they will not keep your program open when your Main exits. You need a CountdownEvent to track when everything is done.
  3. You don't start up your consumers till after your producer has finished all of it's work, you should spin off the producer in to a separate thread or start the consumers first so they are ready to work while you populate the producer on the main thread.

here is a updated version of the code with the fixes

class Program
{
    private const int MaxThreads = 100; //way to high for this example.
    private static readonly CountdownEvent cde = new CountdownEvent(MaxThreads);
    public static readonly BlockingCollection<User> users = new BlockingCollection<User>();

    static void Main(string[] args)
    {
        Run(); 

        for (int i = 0; i < 100000; i++)
        {
            var u = new User {Id = i, Name = "user " + i};
            users.Add(u);
        }
        users.CompleteAdding();
        cde.Wait();
    }

    static void Run()
    {
        for (int i = 0; i < MaxThreads; i++)
        {
            Task.Factory.StartNew(Process, TaskCreationOptions.LongRunning);
        }
    }

    static void Process()
    {
        foreach (var user in users.GetConsumingEnumerable())
        {
            Console.WriteLine(user.Id);
        }
        cde.Signal();
    }
}

public class User
{
    public int Id { get; set; }
    public string Name { get; set; }
}

For the "Best number of threads" like I said earlier, it really depends on what you are waiting on.

If what you are processing is CPU bound, the optimum number of threads is likely Enviorment.ProcessorCount.

If what you are doing is waiting on a external resource, but new requests do not affect old requests (for example asking 20 different servers for information, server the load on server n does not affect the load on server n+1) in that case I would let Parallel.ForEach just choose the number of threads for you.

If you are waiting on a resource that is contended (for example reading/writing to a hard disk) you will want to not use very many threads at all (perhaps even only use one). I just posted a answer in another question about that, when reading in from the hard disk, you should only just use one thread at a time so the hard drive is not jumping around all over trying to complete all the reads at once.

Community
  • 1
  • 1
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • how can u call cde.Signal(); from Process(), do u mean to make that a static readonly instance? – DarthVader Jun 21 '13 at 20:59
  • Yes I did, I wrote the code here, not in a debugger. I will correct. – Scott Chamberlain Jun 21 '13 at 21:05
  • I am working with IO , ie: getting user info from exchange/AD, but I have so many users that I need some parallelism, would this model be efficient? or would u recommend another approach? – DarthVader Jun 21 '13 at 21:07
  • 1
    You do not want to bog down the AD server by flooding it with too many requests, so I would only use a few threads. Also be aware, [whichever](http://msdn.microsoft.com/en-us/library/system.directoryservices.accountmanagement.principalcontext.aspx) [context](http://msdn.microsoft.com/en-us/library/system.directoryservices.directoryentry.aspx) you use to talk to the server is likely not thread safe, so you will need to create a new context per thread. Depending on what you are doing with the data the best appoch may be to queue up the work and just do one thread working on the queue. – Scott Chamberlain Jun 21 '13 at 21:13
  • Are you sure OP's code will start 100 threads? I thought the number of threads started was up to the scheduler (in this case the ThreadPool), and that would limit the number of threads to a reasonable value. – Asad Saeeduddin Jun 05 '15 at 12:45