We have following problem. We parsing files (producer) and convert the data into a c# data format. Afterwards we need to merge all of this data together. As this can be done in parallel we startet to implement a producer consumer pattern but stucking a bit in how to merge the results in a optimized manner.
Producer produces 5 data elements (named as follows):
1, 2, 3, 4, 5
Merges which will be done but the order does not matter. As soon as there are 2 elements created, they can be merged.
Example:
(1)and(2), (3)and(4), (12)and(34), (1234)and(5)
Data data = new Data();
BlockingCollection<Data> collection = new BlockingCollection<Data>();
Task consumer = Task.Factory.StartNew(() =>
{
while (!collection.IsCompleted)
{
var item = collection.Take();
data.Merge(item);
}
});
Task producer = Task.Factory.StartNew(() =>
{
Parallel.ForEach(files, file =>
{
collection.Add(new Data(file));
});
collection.CompleteAdding();
});
Task.WaitAll(consumer, producer);
//here we got the data merged with all files
return data;
This code works but has a problem. In our case the producer is much faster than the consumer. So we need parallel consumers who are waiting for two items to be at the queue. Then they should take them, merge them together and put them back to the queue. Is there any known pattern for such a merge issue?