3

Kafka gets orders from others countries.

I need to group these orders by countries. Should I create more topics with country name or about to have one topic with different partitions?

Another was it to have one topic and use strean Kafka that filters orders and sends to specific country topic?

What is better if anmount of countries is over 180?

I want distribute orders across executers who is placed in specific country/city.

Remark:

So, order has data about country/city. Then Kafka must find executers in this country/city and send them the same order.

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156

1 Answers1

4

tl;dr

In your case, I would create one topic countries and use the country_id or country_name as the message key so that messages for the same country, are placed in the same partition. In this way, each partition will contain information for specific country (or countries - it depends).


I would say this decision depends on multiple factors;

  • Logic/Separation of Concerns: You can decide whether to use multiple topics over multiple partitions based on the logic you are trying to implement. Normally, you need distinct topics for distinct entities. For example, say you want to stream users and companies. It doesn't make much sense to create a single topic with two partitions where the first partition holds users and the second one holds the companies. Also, having a single topic for multiple partitions won't allow you to implement e.g. message ordering for users that can only be achieved using keyed messages (message with the same key are placed in the same partition).

  • Host storage capabilities: A partition must fit in the storage of the host machine while a topic can be distributed across the whole Kafka Cluster by partitioning it across multiple partitions. Kafka Docs can shed some more light on this:

    The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism—more on that in a bit.

  • Throughput: If you have high throughput, it makes more sense to create different topics per entity and split them into multiple partitions so that multiple consumers can join the consumer group. Don't forget that the level of parallelism in Kafka is defined by the number of partitions (and obviously active consumers).

  • Retention Policy: Message retention in Kafka works on partition/segment level and you need to make sure that the partitioning you've made in conjunction with the desired retention policy you've picked will support your use case.

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
  • Thank you. What about if I have recommendations topic, where I store recommendations for users about films. How to bind user with this topic? For retrieving recommendations for concrete user_id? I can create partition with key: user_id. But what id users are over 100 000, Kafka says not create more then 10 000 partitions in topic. How to solve it? –  May 16 '20 at 18:03
  • And how to be if I have entity Orders, then I want classificate these orders by some parameters. Should it be different classifier topics or partitions? –  May 16 '20 at 18:04
  • 1
    @AliceMessis In your case, I would create one topic `countries` and use the `country_id` or `country_name` as the message key so that messages for the same country, are placed in the same partition. In this way, each partition will contain information for specific country (or countries - it depends). – Giorgos Myrianthous May 16 '20 at 18:05
  • How to classifier order by paramaters. For example user can subscribe on orders where price is more that 300 points –  May 16 '20 at 18:08
  • 2
    Hi @AliceMessis, please remember that stackoverflow is meant to have one question and one answer. Adding more and more questions in the comments is discouraged unless you want to understand the answer given to you. For future questions I recommend to provide as much as possible information and concerns in your original post. This helps people to give you comprehensive answers. – Michael Heil May 16 '20 at 18:26