Using TPL Dataflow: How to ensure items of the same category are processed in the order they arrive?

Question

In researching a problem I’ve stumbled on TPL Dataflow objects, and they seem like ideal tools for a solution I’m building, except I can’t see how to processes categories of messages in sequence.

Lets say I’m consuming messages from some broker. One of the properties of the message is a category/grouping number. I can process messages for Category 1, 2 & 3 in parallel, but I must process all messages for each Category in the order they arrive in.

There could be 20k – 50k categories active at any time, a category could remain active for 12 months (rarely longer, but they can be active for years).

The messages have no usable sequence number or timestamp.

Of 1000 messages arriving in one minute, they could

Each be for a different Category
Many for unique categories, but a large portion randomly spread over a selection of categories
Grouped to almost appear in batches by a couple of hundred different categories.

A given category can be highly active for 5 minutes to several hours (receiving 1000’s of message over that time), then taper off to infrequent occurrences for days or weeks before becoming highly active again. After a number of months we may never see a category again.

Can TPL Dataflow process messages from different categories in parallel while processing messages in each category sequentially? . . . and if so how?

What message broker are you using? Most message brokers support what you are asking for, it's a common use case — Andrew Williamson, Dec 16 '20 at 22:38
Related: [Send parallel requests but only one per host](https://stackoverflow.com/questions/57022754/send-parallel-requests-but-only-one-per-host-with-httpclient-and-polly-to-gracef). Alternatively you could consider using a [keyed lock or semaphore](https://stackoverflow.com/questions/31138179/asynchronous-locking-based-on-a-key). — Theodor Zoulias, Dec 16 '20 at 22:42
You should use a Queue per category and dont start the next item until the current is finished. — Jeroen van Langen, Dec 17 '20 at 00:21
Dataflow blocks preserver the order of all messages, even when DOP>1. If you have a `TransformBlock` with DOP >1, the output T2 messages will be in the same order the input T1 messages arrived. If it has to, a block will delay emitting a message eg M3 until the previous messages M1 and M2 are emitted. — Panagiotis Kanavos, Dec 17 '20 at 14:48
Given that blocks preserve order, what's the actual problem you want to solve? — Panagiotis Kanavos, Dec 18 '20 at 09:53
You'll have to explain what's going on, why the order wasn't preserved, to get an answer — Panagiotis Kanavos, Dec 21 '20 at 19:52

Using TPL Dataflow: How to ensure items of the same category are processed in the order they arrive?

0 Answers0