I am looking at moving some of my backend to a google cloud pub/sub and cloud dataflow architecture to handle stream processing. One of my main needs is to ensure that messages arrive in order to the subscriber, i am wondering if it is possible to fix each dataflow worker to the topic they are subscribing from, that way they shouldn't lose the message order they are processing.
Asked
Active
Viewed 282 times
1 Answers
0
You can't fix specific workers to specific topics. The purpose of Dataflow is to automatize the partition, distribution and parallel processing of a load, not for the workers to perform tasks for different loads.
But, you can use a fixed windowing with a Group By Key and then Sort the elements coming out. Just because the workers can't fix to specific topics, it doesn't mean that you can't do groupings and sorting so you can have each topic ordered. You also have available side inputs if you need to inject additional data when processing each element.
I would also suggest to check the documentation for PubSub's Message Ordering.

Neri
- 166
- 1
- 3