0

Assume, I have the following JSON:

{
 id: 12,
 name: 'bob',
 address: {
    street: 'main',
    code:1234
 }
}

When the street name changes to 'new street' , should I publish the complete document

{
 id: 12,
 name: 'bob',
 address: {
    street: 'new street',
    code:1234
 }
}

or just the change?

{
 id: 12,
 address: {
    street: 'new street'
 }
}

In the conrete case described here it is stated, that both is allowed. The kafka message size is limited such as described here and so I would favor sending only the changes. But what was Kafka designed for?

In other words: shoud Kafka events be PUT or PATCH messages?

user3579222
  • 1,103
  • 11
  • 28
  • I don't believe there is enough information about your use case to answer to the size of your data. My team sends roughly 120 JSON objects with over 100 fields per second just on one topic. I would completely avoid sending partial messages through Kafka. If your data is really that large use Protobuf. – A Webb Jul 14 '22 at 13:39

2 Answers2

1

Kafka is typically used for sending readable messages, although you can send binary formats. Sending very large data is not recommended due to performance issues on both the broker as well as the clients.

Enforcing a limit on the size of the message ensures that you will not end up storing large files which lead to increase in the load on your kafka brokers and client applications.

Regarding your original question as to whether to store just the changes/full message. You can choose either depending on your use-case.

If you store only the changes and if you are in need of other information, you would need to fetch it from somewhere else, perhaps a different database/datastore or another kafka topic, so this becomes an additional burden because you are involving another datastore and also some additional code for fetching. In simple words, you need to perform joins.

If you store the whole message, you will be saved from the performance hit of joins, however you may end up in relatively larger message size. Unless, your message is too large for your Kafka broker or your clients to handle and either/both of them are showing performance degradation due to the size, you can try tuning the parameters before you move on to using the "change-only" approach.

Typically, we store de-normalized data in Kafka to avoid joins.

JavaTechnical
  • 8,846
  • 8
  • 61
  • 97
0

I would recommend to send full entity with additional data like:

  • time,
  • idempotency key,
  • information about what changed

in case of you have not to much traffic.

This will help to avoid dificulties when:

  • data come several times
  • some data skipped
  • changes reordered somewhere
  • etc

So, structure can be something like this:

{
  changed: 'address',
  time: 72980254,
  entity:
  {
    id: 12,
    name: 'bob',
    address: {
      street: 'main',
      code:1234
   }
}
gabba
  • 2,815
  • 2
  • 27
  • 48