0

I would like to get advise from experienced people to build a HA infrastructure to log 2To of data in JSON format every week. I need to have a retention time of 7 days and need to be able to requests these data by API.

The global requirements are : - Handle 400 000 000 requests per month (154/s) - Need to handle flat logs in JSON format and clear it by removing not needed informations to save disk space in the end. - Have a real time dashboard of flat logs - Have a dashboard to be able to made charts or requests on the final logs stored for 7 days. - Have an API available to be able to retrieve data of these logs from an external software. - The infrastructure should be able to easily growing in the future.

I was thinking to create a Kubernetes Cluster on-premise on 6 dedicated servers (3 masters and 3 workers). The 3 workers nodes will have 2To of SSD disk space each. The flat log requests will be send to Kafka, Graylog will subscribe to the kafka topic and send it to Elastic search, so : - Kafka allow me to handle thousands of requests per seconds without problems (because some peaks is possible) - Graylog allow me to handle logs in real time and get a dashboard - ElasticSearch allow me to get a analytics dashboard and API

What do you think of that ? Did you already work on a big logging system, if yes, what did you use ?

1 Answers1

1

If I understand it correctly, you want to store and index these JSON data as they are, in simple text form. While storing should not be a great issue, indexing them for fast search and data visualization requires to use some binary format. Graylog (and Elastic) does exactly that, using a binary index to store messages.

So the key questions are:

  • can Graylog (or Elastic) be used for such a task? Yes. 150 req/s is not such a big rate, especially with each entry being ~24 KB (2 TB / 7 / 86400 / 150), with a resulting write rate of ~4 MB/s

  • what hardware setup is required? I would start with a single SSD-equipped machine using ZFS and configured with ashift=12, sync=disabled, atime=off, compression=lz4, recordsize=16K, xattr=off. Scale it as required after collecting valuable real-world experience.

shodanshok
  • 47,711
  • 7
  • 111
  • 180