1

In my Producer-Consumer scenario, I have multiple consumers, and each of the consumers send an action to external hardware, which may take some time. My Pipeline looks somewhat like this:

BatchBlock --> TransformBlock --> BufferBlock --> (Several) ActionBlocks

I have assigned BoundedCapacity of my ActionBlocks to 1. What I want in theory is, I want to trigger the Batchblock to send a group of items to the Transformblock only when one of my Actionblocks are available for operation. Till then the Batchblock should just keep buffering elements and not pass them on to the Transformblock. My batch-sizes are variable. As Batchsize is mandatory, I do have a really high upper-limit for BatchBlock batch size, however I really don't wish to reach upto that limit, I would like to trigger my batches depending upon the availability of the Actionblocks permforming the said task.

I have achieved this with the help of the Triggerbatch() method. I am calling the Batchblock.Triggerbatch() as the last action in my ActionBlock.However interestingly after several days of working properly the pipeline has come to a hault. Upon checking I found out that sometimes the inputs to the batchblock come in after the ActionBlocks are done with their work. In this case the ActionBlocks do actually call Triggerbatch at the end of their work, however since at this point there is no input to the Batchblock at all, the call to TriggerBatch is fruitless. And after a while when inputs do flow in to the Batchblock, there is no one left to call TriggerBatch and restart the Pipeline. I was looking for something where I could just check if something is infact present in the inputbuffer of the Batchblock, however there is no such feature available, I could also not find a way to check if the TriggerBatch was fruitful.

Could anyone suggest a possible solution to my problem. Unfortunately using a Timer to triggerbatches is not an option for me. Except for the start of the Pipeline, the throttling should be governed only by the availability of one of the ActionBlocks.

The example code is here:

    static BatchBlock<int> _groupReadTags;

    static void Main(string[] args)
    {
        _groupReadTags = new BatchBlock<int>(1000);

        var bufferOptions = new DataflowBlockOptions{BoundedCapacity = 2};
        BufferBlock<int> _frameBuffer = new BufferBlock<int>(bufferOptions);
        var consumerOptions = new ExecutionDataflowBlockOptions { BoundedCapacity = 1};
        int batchNo = 1;


        TransformBlock<int[], int> _workingBlock = new TransformBlock<int[], int>(list =>
        {

            Console.WriteLine("\n\nWorking on Batch Number {0}", batchNo);
            //_groupReadTags.TriggerBatch();
            int sum = 0;

            foreach (int item in list)
            {
                Console.WriteLine("Elements in batch {0} :: {1}", batchNo, item);
                sum += item;

            }
            batchNo++;
            return sum;

        });

            ActionBlock<int> _worker1 = new ActionBlock<int>(async x =>
            {
                Console.WriteLine("Number from ONE :{0}",x);
                await Task.Delay(500);

                    Console.WriteLine("BatchBlock Output Count : {0}", _groupReadTags.OutputCount);

                _groupReadTags.TriggerBatch();



        },consumerOptions);

        ActionBlock<int> _worker2 = new ActionBlock<int>(async x =>
        {
            Console.WriteLine("Number from TWO :{0}", x);
            await Task.Delay(2000);
            _groupReadTags.TriggerBatch();

        }, consumerOptions);

        _groupReadTags.LinkTo(_workingBlock);
        _workingBlock.LinkTo(_frameBuffer);
        _frameBuffer.LinkTo(_worker1);
        _frameBuffer.LinkTo(_worker2);

        _groupReadTags.Post(10);
        _groupReadTags.Post(20);
        _groupReadTags.TriggerBatch();

        Task postingTask = new Task(() => PostStuff());
        postingTask.Start();
        Console.ReadLine();

    }



    static void PostStuff()
    {


        for (int i = 0; i < 10; i++)
            {
                _groupReadTags.Post(i);
                Thread.Sleep(100);
            }

        Parallel.Invoke(
            () => _groupReadTags.Post(100),
            () => _groupReadTags.Post(200),
            () => _groupReadTags.Post(300),
            () => _groupReadTags.Post(400),
            () => _groupReadTags.Post(500),
            () => _groupReadTags.Post(600),
            () => _groupReadTags.Post(700),
            () => _groupReadTags.Post(800)
                       );
    }
Ricky
  • 81
  • 11
  • Throttling is achieved by setting the proper limits to input, output *and* link options, eg by setting the [DataflowLinkOptions.MaxMessages](https://msdn.microsoft.com/en-us/library/system.threading.tasks.dataflow.dataflowlinkoptions.maxmessages(v=vs.110).aspx) property. The posted code doesn't even propagate completion though - the `LinkTo(source,target)` overload doesn't propagate completion. You need to use [the overload](https://msdn.microsoft.com/en-us/library/hh462705(v=vs.110).aspx) that accepts both link options and a filter predicate – Panagiotis Kanavos Sep 22 '15 at 13:02
  • @PanagiotisKanavos max messages has nothing to do with throttling. – i3arnon Sep 22 '15 at 13:04
  • @i3arnon I'm simply pointing out that throttling should be handled using the dataflow's mechanisms, not try to emulate them from scratch. It's not clear from the question *why* setting bounds doesn't work and the code doesn't help - completion isn't propagated and the only bound is equal to 1 – Panagiotis Kanavos Sep 22 '15 at 13:07
  • @PanagiotisKanavos Thanks for your comment. I do not really wish to propagate completion, since this code should actually run all the time, If I propagate completion I would have to re-initialize all of my TPL blocks and I do not want to do that. Plus I don't have a filter predicate, since I do not wish to conditionally propagate my outputs. The issue here is something else. I would wish to use the TriggerBatch() method, everything does work properly, except the case that I have outlined above, when the input to the Batchblock comes in after all of the TriggerBatch() methods have been called. – Ricky Sep 22 '15 at 13:09
  • re-phrased the question so as not to create any confusion. – Ricky Sep 22 '15 at 13:18

2 Answers2

0

I have found that using TriggerBatch in this way is unreliable:

    _groupReadTags.Post(10);
    _groupReadTags.Post(20);
    _groupReadTags.TriggerBatch();

Apparently TriggerBatch is intended to be used inside the block, not outside it like this. I have seen this result in odd timing issues, like items from next batch batch being included in the current batch, even though TriggerBatch was called first.

Please see my answer to this question for an alternative using DataflowBlock.Encapsulate: BatchBlock produces batch with elements sent after TriggerBatch()

Community
  • 1
  • 1
Loren Paulsen
  • 8,960
  • 1
  • 28
  • 38
0

Here is an alternative BatchBlock implementation with some extra features. It includes a TriggerBatch method with this signature:

public int TriggerBatch(int nextMinBatchSizeIfEmpty);

Invoking this method will either trigger a batch immediately if the input queue is not empty, otherwise it will set a temporary MinBatchSize that will affect only the next batch. You could invoke this method with a small value for nextMinBatchSizeIfEmpty to ensure that in case a batch cannot be currently produced, the next batch will occur sooner than the configured BatchSize at the block's constructor.

This method returns the size of the batch produced. It returns 0 in case that the input queue is empty, or the output queue is full, or the block has completed.

public class BatchBlockEx<T> : ITargetBlock<T>, ISourceBlock<T[]>
{
    private readonly ITargetBlock<T> _input;
    private readonly IPropagatorBlock<T[], T[]> _output;
    private readonly Queue<T> _queue;
    private readonly object _locker = new object();
    private int _nextMinBatchSize = Int32.MaxValue;

    public Task Completion { get; }
    public int InputCount { get { lock (_locker) return _queue.Count; } }
    public int OutputCount => ((BufferBlock<T[]>)_output).Count;
    public int BatchSize { get; }

    public BatchBlockEx(int batchSize, DataflowBlockOptions dataflowBlockOptions = null)
    {
        if (batchSize < 1) throw new ArgumentOutOfRangeException(nameof(batchSize));
        dataflowBlockOptions = dataflowBlockOptions ?? new DataflowBlockOptions();
        if (dataflowBlockOptions.BoundedCapacity != DataflowBlockOptions.Unbounded &&
            dataflowBlockOptions.BoundedCapacity < batchSize)
            throw new ArgumentOutOfRangeException(nameof(batchSize),
            "Number must be no greater than the value specified in BoundedCapacity.");

        this.BatchSize = batchSize;

        _output = new BufferBlock<T[]>(dataflowBlockOptions);

        _queue = new Queue<T>(batchSize);

        _input = new ActionBlock<T>(async item =>
        {
            T[] batch = null;
            lock (_locker)
            {
                _queue.Enqueue(item);
                if (_queue.Count == batchSize || _queue.Count >= _nextMinBatchSize)
                {
                    batch = _queue.ToArray(); _queue.Clear();
                    _nextMinBatchSize = Int32.MaxValue;
                }
            }
            if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);

        }, new ExecutionDataflowBlockOptions()
        {
            BoundedCapacity = 1,
            CancellationToken = dataflowBlockOptions.CancellationToken
        });

        var inputContinuation = _input.Completion.ContinueWith(async t =>
        {
            try
            {
                T[] batch = null;
                lock (_locker)
                {
                    if (_queue.Count > 0)
                    {
                        batch = _queue.ToArray(); _queue.Clear();
                    }
                }
                if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
            }
            finally
            {
                if (t.IsFaulted)
                {
                    _output.Fault(t.Exception.InnerException);
                }
                else
                {
                    _output.Complete();
                }
            }
        }, TaskScheduler.Default).Unwrap();

        this.Completion = Task.WhenAll(inputContinuation, _output.Completion);
    }

    public void Complete() => _input.Complete();
    void IDataflowBlock.Fault(Exception ex) => _input.Fault(ex);

    public int TriggerBatch(Func<T[], bool> condition, int nextMinBatchSizeIfEmpty)
    {
        if (nextMinBatchSizeIfEmpty < 1)
            throw new ArgumentOutOfRangeException(nameof(nextMinBatchSizeIfEmpty));
        int count = 0;
        lock (_locker)
        {
            if (_queue.Count > 0)
            {
                T[] batch = _queue.ToArray();
                if (condition == null || condition(batch))
                {
                    bool accepted = _output.Post(batch);
                    if (accepted) { _queue.Clear(); count = batch.Length; }
                }
                _nextMinBatchSize = Int32.MaxValue;
            }
            else
            {
                _nextMinBatchSize = nextMinBatchSizeIfEmpty;
            }
        }
        return count;
    }

    public int TriggerBatch(Func<T[], bool> condition)
        => TriggerBatch(condition, Int32.MaxValue);

    public int TriggerBatch(int nextMinBatchSizeIfEmpty)
        => TriggerBatch(null, nextMinBatchSizeIfEmpty);

    public int TriggerBatch() => TriggerBatch(null, Int32.MaxValue);

    DataflowMessageStatus ITargetBlock<T>.OfferMessage(
        DataflowMessageHeader messageHeader, T messageValue,
        ISourceBlock<T> source, bool consumeToAccept)
    {
        return _input.OfferMessage(messageHeader, messageValue, source,
            consumeToAccept);
    }

    T[] ISourceBlock<T[]>.ConsumeMessage(DataflowMessageHeader messageHeader,
        ITargetBlock<T[]> target, out bool messageConsumed)
    {
        return _output.ConsumeMessage(messageHeader, target, out messageConsumed);
    }

    bool ISourceBlock<T[]>.ReserveMessage(DataflowMessageHeader messageHeader,
        ITargetBlock<T[]> target)
    {
        return _output.ReserveMessage(messageHeader, target);
    }

    void ISourceBlock<T[]>.ReleaseReservation(DataflowMessageHeader messageHeader,
        ITargetBlock<T[]> target)
    {
        _output.ReleaseReservation(messageHeader, target);
    }

    IDisposable ISourceBlock<T[]>.LinkTo(ITargetBlock<T[]> target,
        DataflowLinkOptions linkOptions)
    {
        return _output.LinkTo(target, linkOptions);
    }
}

Another overload of the TriggerBatch method allows to examine the batch that can be currently produced, and decide if it should be triggered or not:

public int TriggerBatch(Func<T[], bool> condition);

The BatchBlockEx class does not support the Greedy and MaxNumberOfGroups options of the built-in BatchBlock.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104