1

Background

I inherited a Kafka/Zookeeper installation. I have a passing knowledge of those - I know the general architecture, how clients work, about topics, etc., have been involved in programming Java clients etc.

But the installation is somewhat dubious. They are three instances of Kafka and Zookeeper each (in their separate docker containers). Supposedly they should work, but what I am seeing is all processes spout immense amount of log output with loads and loads of (diverse) warnings and errors. I have the impression that some of these seem to be quite normal (or are being self-healed all the time), and am having a very hard time figuring if everything works as intended or not, and set up correctly.

Some of these are - according to Google - related to unclean shutdowns of the brokers; corrupted individual topics and such. As this is a test environment, I can easily delete such files.

I know about some commands which help me check topics etc. (basic stuff, like listing them, displaying their individual configuration etc.).

However...

Question

Is there an online ressource/documentation which can be used as a systematic walkthrough to check whether everything is basically setup OK; for example to clear up these questions:

  • Do the three Zookeepers and the three Kafka instances correctly talk to each other for high-availability purposes? Do they have a correct "leader" etc.?
  • Are the servers generally "healthy", i.e., easily able to accept connections etc.?
  • How are the topics working (what's in there, how many messages, etc.)?

I am aware that one may very quickly dismiss this question as too generic; I am not asking you to solve my problems. I am looking for a ressource to systematically walk through such an installation - it may or may not cover the examples I have given, but it definitely should give a systematic way to find out if things are fundamentally wrong.

Community
  • 1
  • 1
AnoE
  • 8,048
  • 1
  • 21
  • 36

2 Answers2

0

This packtpub tutorial/training by Stéphane Maarek is wonderful resource for setting kafka in cluster mode. However he did that in AWS cloud in ubuntu VM.

I have followed the same steps and installed in Vagrant VMs in cent OS. You can find the code here.

The VM has yahoo kafka manager to monitor the kafka internal details. list of broker available, healthy , partitions, leaders etc.,

kafka manager can help you with high level monitoring.

Please provide your comments.

user51
  • 8,843
  • 21
  • 79
  • 158
0

Rather than looking solely at logs, you might want to familiarize yourself with JMX metrics and how you can gather them across the cluster.

If you want to actually collect and analyze logs, you'll likely need to separately use something like Elasticsearch.

You won't see "how many messages" in a topic, and you'll need even more monitoring to know if a port is actually open and the Kafka process is running, the disks are filling up, etc.

My point here is that, Kafka needs fed and watered, if you plan to productionalize it, you can't just set up a small cluster and forget about it. Even if you think it's setup correctly at the beginning, increasing the load on it will cause it to fall in a bad state eventually.

For a limited trial for your dev environment to get a full look at your cluster health, Confluent Control Center can assist with that.


To solve the "what's in there" problem, I suggest you setup a Schema Registry, and convince Kafka producers to use it.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245