In researching a problem I’ve stumbled on TPL Dataflow objects, and they seem like ideal tools for a solution I’m building, except I can’t see how to processes categories of messages in sequence.
Lets say I’m consuming messages from some broker. One of the properties of the message is a category/grouping number. I can process messages for Category 1, 2 & 3 in parallel, but I must process all messages for each Category in the order they arrive in.
There could be 20k – 50k categories active at any time, a category could remain active for 12 months (rarely longer, but they can be active for years).
The messages have no usable sequence number or timestamp.
Of 1000 messages arriving in one minute, they could
- Each be for a different Category
- Many for unique categories, but a large portion randomly spread over a selection of categories
- Grouped to almost appear in batches by a couple of hundred different categories.
A given category can be highly active for 5 minutes to several hours (receiving 1000’s of message over that time), then taper off to infrequent occurrences for days or weeks before becoming highly active again. After a number of months we may never see a category again.
Can TPL Dataflow process messages from different categories in parallel while processing messages in each category sequentially? . . . and if so how?