2

I am using a BlockingCollection to process some files and then upload them to a server.

Right now I have a single Producer that recurses the file system and compresses certain files to a temporary location. Once it has finished with a file it adds my own object to the BlockingCollection that contain information, such as, File Name, File Path, Modified date, etc. The Consumer then grabs this object and uses it to upload the file. When the Producer has finished searching the file system and working on files it calls the BlockingCollection.CompleteAdding() method to signal the Consumer that it has finished.

What I would like to do is increase the number of Producers to 2 or more. The reason being that the compression process takes a while and on multi core processors I'm only taking advantage of 1 core. This causes the Producer to sometimes fall behind the Consumer on faster networks.

My question is, when I have multiple Producers and only one Consumer how can I signal the Consumer that all of the Producers have finished their work? If I call the BlockingCollection.CompleteAdding() method on one of the Producers I could still have one or more other producers still working.

forcedfx
  • 3,388
  • 1
  • 12
  • 8
  • 3
    It might be more beneficial to offload the compression to the consumer and make the producer as thin as possible. – M.Babcock Feb 08 '12 at 14:10
  • 1
    Agree with Babcock, recursing the directry tree in aprallel would only clog the filesystem. It's better to add the filelocations you want to process to the queue and then compress and send in parallel. Keep an eye on you HDD throughput. Reading in mulitple places at the same will cause it to move the heads across disk to different sectors. In that time it will not read data and so lower the disk throughput. Ideally you would still read on 1 thread, read each file into a byte[] and then compress and send those in parallel. – gjvdkamp Feb 08 '12 at 15:09
  • Sorry, I should have been more clear in my original post. There is only one recursion thread that will feed 2 or more compression (Producer) tasks which will then feed a single Consumer task. I'm using high compression which takes quite a bit of time even on fast processors. Writing at most about 2MB of data per second per thread. – forcedfx Feb 08 '12 at 15:51

2 Answers2

2

You can use a semaphore in your Producer code before calling the BlockingCollection.CompleteAdding(). The semaphore is shared by all the Producer instances, when the last producer has finished it can call the method. The semaphore can be implemented as a simple counter, increment the counter when a Producer is created, decrement it when your producer ends its job. If the counter reaches zero then the BlockingCollection.CompleteAdding() can be called.

Massimo Zerbini
  • 3,125
  • 22
  • 22
0

I use something like this to have multiple producers and consumers. It is just a very simple solution not optimized for production code.

public class ManageBatchProcessing 
{
    private  BlockingCollection<Action> blockingCollection;

    public void Process()
    {
        blockingCollection = new BlockingCollection<Action>();
        int numberOfBatches = 10;
        Process(HandleProducers, HandleConsumers, numberOfBatches);
    }

    private void Process(Action<int> produce, Action<int> consume, int numberOfBatches)
    {
        produce(numberOfBatches);
        consume(numberOfBatches);
    }

    private void HandleConsumers(int numberOfBatches)
    {
        var consumers = new List<Task>();

        for (var i = 1; i <= numberOfBatches; i++)
        {
            consumers.Add(Task.Factory.StartNew(() =>
            {
                foreach (var action in blockingCollection.GetConsumingEnumerable())
                {
                    action();
                }
            }));
        }

        Task.WaitAll(consumers.ToArray());
    }

    private void HandleProducers(int numberOfBatches)
    {
        var producers = new List<Task>();

        for (var i = 0; i <= numberOfBatches; i++)
        {
            producers.Add(Task.Factory.StartNew(() =>
            {
                blockingCollection.Add(() => YourProdcerMethod());
            }));
        }

        Task.WaitAll(producers.ToArray());
        blockingCollection.CompleteAdding();
    }
}
Akbar
  • 1
  • 1
  • So you are waiting all producers to finish before starting your consumers? This defeats the purpose of the producer/consumer pattern. – Theodor Zoulias Jul 07 '19 at 19:54