2

The default value of logs.dir=/tmp/kafka-logsin server.properties. Usually /tmp is avoided from keeping any important files and we are storing messages and offsets!

Any particular reason why one may not choose /var/log/kafka-logs or /opt/kafka-logs

NOTE - Assuming /tmp, /var/log are all same file-system type.

Divs
  • 121
  • 3
  • It baffles me that kafka developers continue to have /tmp as the default location for critical data! Why is there not more noise about this? – Dojo Nov 16 '22 at 05:03

2 Answers2

3

You'll always find me placing files in standard directories or as close to them as possible.

The reason for this is so that future admins can find them later -- because very often that future admin is me!

Consider logs, for instance, since that's what you have brought up. I would create a subdirectory in /var/log to store these, such as /var/log/kafka. The directory /var/log is where most admins will go first to look for logs for any package. Apache's default of /tmp/kafka-logs is pretty senseless, as you've already discovered. Cloudera's default log directory /var/log/kafka makes much more sense.

If it turns out that you need to mount a disk partition to store logs, you don't have to change the log directory; instead, you can mount the new disk space directly at /var/log/kafka.

And /opt is intended for large third party packages; it's not where I expect to find most things. There are few standards or conventions for anything in this directory, so things could end up difficult to find.

Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
  • 2
    Please be aware that kafka logs are not really logs but kafka stream messages https://stackoverflow.com/questions/42757742/kafka-logs-folder-too-huge . So the recommendation for /var/log/ might not be the best one :) – Alex H Jul 27 '18 at 16:13
  • And, hopefully you don't mind, but the default is from cloudera which is an spinoff like ambari, confluent-kafka. The question only contains kafka, so that might be apache kafka http://kafka.apache.org/090/documentation.html#brokerconfigs .They do have at least a couple of difference between them. I hope this helps us both. – Alex H Jul 27 '18 at 16:18
  • That's all true. I'm going to think about it a bit more and will probably edit this later. – Michael Hampton Jul 27 '18 at 16:31
  • I understand why /tmp is not recommended, but why /opt as well? if it is intended for third party packages then it means this doesn't get cleaned up as /tmp is...? – aurelius Oct 06 '20 at 07:11
2

The best place is a separate partition mounted to have all of the data in the same place, which holds no other function for the OS and/or other installed packages

/var/log/ is being handled by other programs also, logrotate etc. so not really safe to have your Kafka data there

/opt/ should not hold any program data, only additional installed software

Depending on what kind of messages you have, you might want to limit any interaction/possible issue with them.

Alex H
  • 1,814
  • 11
  • 18
  • Thanks @Alex H. I am not so good in linux file system, so don't understand the partitions/blocks that well. Can you please provide me an example in terms of dir? – Divs Jul 27 '18 at 16:18
  • 1
    You could create an directory under /var/ and hold the data there. Make sure that the kafka folder does not reside on root(as that might crash the VM) and you have enough space there for what you need. Np, we are all learning. – Alex H Jul 27 '18 at 16:21