1

Currently I'm working on a pipeline data flow, where each stage except stage 1 is a async running consumer and producer. I have objects "flowing" through my pipeline, which reference items. In Stage 3 I would like to create a loop and buffer all objects, that meet a special condition (Stage Loop).

If new objects come in (Stage 3) while there are other objects currently buffered (Stage Loop), I would like to check if they match in their referencing item and if so post those to the BufferBlock of Stage Loop.

The question is, how can I check the referencing item of all objects in Stage Loop from within Stage 3?

The pipeline kinda looks like this:

Incoming objects ->  
  BufferBlock1 -> Parsing (Stage2) ->  
  BufferBlock2 -> Processing (Stage3) ->
  BufferBlock3 -> Stage Loop ->  
    Back to BufferBlock 2
VMAtm
  • 27,943
  • 17
  • 79
  • 125
Peter
  • 135
  • 2
  • 13
  • This might be relevant: [How to mark a TPL dataflow cycle to complete?](https://stackoverflow.com/questions/26130168/how-to-mark-a-tpl-dataflow-cycle-to-complete) – Theodor Zoulias Jun 25 '20 at 20:30

1 Answers1

0

You really don't need that many BufferBlock's in your chain. The TPL Dataflow contains a TransformBlock, which encapsulates the BufferBloсk and ActionBlock logic, and have an output block for handled messages.

As for the loop, you can link the blocks between each other with static extension method, so this could be looks like

stage2.LinkTo(stage3, CheckForExistingProcessing);
stage2.LinkTo(stage4);

Jere stage4 is a queue for messages which didn't pass the check and must be handled in a loop. You can setup additional ActionBlock, or, maybe, simply use TransformBlock to send messages again to appropriate stage. I think that you can also introduce the retry check as some messages probably couldn't be processed at all so somewhat reasons.

Also, as you've said that you have async logic, you probably should SendAsync messages rather than Post them (you can also use the overload with CancellationToken):

// asynchronously wait for a sending with resending attempts
await stage1.SendAsync(m);
// asynchronously wait for a sending with resending attempts with possible cancellation
await stage2.SendAsync(m, token);

Post method is synchronous and drops messages if they aren't accepted by target, comparing the SendAsync method which tries to deliver message even if target cannot accept it right now.

VMAtm
  • 27,943
  • 17
  • 79
  • 125
  • I think i would run into a timing issue, where object1 comes in, will be declined within the CheckForExistingProcessing and gets forwarded to stage4, while object2 comes in and passes the Check, because the referencing item of both objects was released of its lock on it. So i would need to check if there are objects in stage4, which reference the same item, before even doing the check. What i need is some kind of Queue, which postpones objects based on a check for a lock and if there are already locked items for the same item), but keeps the object in until the lock will be available again. – Peter Dec 22 '16 at 13:52
  • You can try the BlockingCollection then on this step. – VMAtm Dec 22 '16 at 15:48
  • I were able to eliminate the loop and locking from my pipeline and i already created a test class for it. One question tho, if i need to import the files in the order they come in and i created the pipeline on the base of "one file at a time". How can i make sure it will be handled in the order it comes in if i call SendAsync? i could add a BufferBlock before the pipeline, where its consumer makes sure to keep the order, but is there a more intelligent way? – Peter Jan 11 '17 at 13:12