7

Considering a stream of different events the recommended way would be

  • one big topic containing all events
  • multiple topics for different types of events

Which option would be better?

I understand that messages not being in the same partition of a topic it means there are no order guarantee, but are there any other factors to be considered when making this decision?

user3452075
  • 411
  • 1
  • 6
  • 17

2 Answers2

4

A topic is a logical abstraction and should contain message of the same type. Let's say, you monitor a website and capture click stream events and on the other hand you have a database that populates it's changes into a changelog topics. You should have two different topics because click stream events are not related to you database changelog.

This has multiple advantages:

  • your data will have different format und you will need different (de)serializers to write read the data (using a single topic you would need a hybrid serializer and you will not get type safety when reading data)
  • you will have different consumer application and one application might be interested in click stream events only, while a second application is only interested in the database changelog and a third application is interested in both. If you have multiple topics, application one and two only subscribe to the topics they are interesting in -- if you have a single topic, application one an two need to read everything and filter the stuff they are not interested in increasing broker, network, can client load
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • How would I proceed regarding more similar events such as user events? (for example: mouse and keyboard events - should they be in the same topic or split; how about left click events and right click events?) – user3452075 Nov 21 '16 at 08:47
  • I think what I am asking is where you draw the line when creating new topics. – user3452075 Nov 21 '16 at 10:06
  • 1
    That depends on your use case. There is no general "golden bullet" answer to this. – Matthias J. Sax Nov 21 '16 at 18:20
  • Off course, but what factors should be considered when making this decision? – user3452075 Nov 22 '16 at 08:59
  • I mentioned two important factors in my answer. For you use case, you need to decide by yourself -- nobody besides you (and your colleagues) have enough inside to figure it out. – Matthias J. Sax Nov 22 '16 at 17:56
3

As @Matthias J. Sax told before there is not a golden bullet over here. But we have to take different topics into account.

The conditioner: ordered deliveries

If you application needs guarantee order delivery, you need to work with only one topic, plus same keys for those messages which need to guarantee it.

If ordering is not mandatory, the game starts...

Does the schema same for all messages?

Would be consumers interested in the same type of different events?

What is gonna happen at the consumer side?, do we are reducing or increasing complexity in terms of implementation, maintainability, error handling...?

Does horizontal scalability important for us? More topics often means more partitions available, which means more horizontal scalability capacity. Also it allows more accurate scalability configuration at the broker side, because we can choose what number of partitions to increase per event type. or at the consumer side, what number of consumers stand up per event type.

Does makes sense parallelising consumption per message type? ...

Technically speaking, if we allow consumers to fine tune those type of events to be consumed we're potentially reducing the network bandwidth required to send undesired messages from the broker to the consumer, plus the number deserialisations for all of them (cpu used, which makes along time more free resources, energy cost reduction...).

Also is worthy to remember that splitting different type of messages in different topics doesn't mean have to consume them with different Kafka consumers because they allow consumption from different topics at the same time.

Well, there's not a clear answer for this question, but I have the feeling that with Kafka, because multiple features, if ordered deliveries are not needed we should split our messages per type in different topics.

Dani
  • 4,001
  • 7
  • 36
  • 60
  • Regarding ordering it is a bit more complicated in a real-life distributed system. If you have multiple nodes producing to Kafka the order is obviously not guaranteed. Even if you have a single node but async producer the order is not guaranteed. And if you have a sync producer - your code blocks on each send. – Matvey Masov Oct 05 '20 at 11:15