There's a question I have about this one, but I haven't gotten a satisfactory answer yet.
In time-series data, the order in which messages are sent is crucial. Let's say a downstream consumer has a Python script that computes windowed statistics on time-series data. Suppose we have a topic with multiple partitions and as you know we have no control over the order of messages that are stored in each partition.
So, how can we make sure that the batch of messages we get has all the data points without any missing data? There is an obvious way to do it, and that is to have a single partition. But that means paying on scale.
Would it be possible to solve such cases without limiting the scalability of the system?