What is the advantage and disadvantage when considering Kafka as a storage?

Question

I have 2 approaches:

Approach #1

Kafka --> Spark Stream (processing data) --> Kafka -(Kafka Consumer)-> Nodejs (Socket.io)

Approach #2

Kafka --> Kafka Connect (processing data) --> MongoDB -(mongo-oplog-watch)-> Nodejs (Socket.io)

Note: in Approach #2, I use mongo-oplog-watch to check when inserting data.

What is the advantage and disadvantage when using Kafka as a storage vs using another storage like MongoDB in real-time application context?

https://www.confluent.io/blog/okay-store-data-apache-kafka/ – Giorgos Myrianthous Jul 03 '19 at 11:57 — Giorgos Myrianthous, Jul 03 '19 at 11:57

score 1 · Accepted Answer · answered Jul 03 '19 at 10:50

Kafka topics typically have a retention period (default to 7 days) after which they will be deleted. Though, there is no hard rule that we must not persist in Kafka.

You can set the topic retention period to -1 (reference)

The only problem, I know of persisting data in Kafka, is security. Kafka, out of the box (atleast as of now) doesn't provide Data-at-rest encryption. You need to go with a custom solution (or a home-grown one) to have that.

Protecting data-at-rest in Kafka with Vormetric

A KIP is also there, but it is Under discussion

Add end to end encryption in Kafka (KIP)

MongoDB on the other hand seems to provide Data-at-rest encryption.

Security data at rest in MongoDB

And most importantly, it also depends on the type of the data that you are going to store and what you want to do with it.

If you are dealing with data that is quite complex (not easy as Key-Value i.e., give the key and get the value model), for example, like querying by indexed fields etc (as you do typically with logs), then MongoDB could probably make sense.

In simple words, if you are querying by more than one field (other than the key), then storing it in MongoDB could make sense, if you intend to use Kafka for such a purpose, you would probably end up with creating a topic for every field that should be queried... which is too much.

What is the advantage and disadvantage when considering Kafka as a storage?

1 Answers1