Better way to scale out logstash and balance loading?

Question

The question originated from: https://groups.google.com/forum/#!topic/logstash-users/cYv8ULhHeE0

By comparing below logstash scale out strategies, tcp load balancer has best performance if traffics/cpu load are balanced. However, it seems hard to balance traffic all the time due to nature of logstash-forwarder <-> logstash tcp connections. Anyone got better idea to make traffic/cpu load more balanced across logstash nodes? Thanks for advise :)

< My scenario >

10+ service node equipped with logstash-forwarder to forward logs to central logstash node(cluster)
each service node's log average throughput, throughput daily distribution, log type's filter complexity varies a lot
- log average throughput: e.g. service_1: 0.5k event/s; service_2: 5k event/s
- throughput daily distribution: e.g. service_1 peak at morning, service_2's peak at night
- log type's filter complexity: by consuming 100% single logstash node's CPU, service_1's log type can be processed at 300 event/s, while service_2's log type is 1500 event/s

< TCP load balancer >

Since tcp connection are persistence between logstash-forwarder and logstash, which means, whether eventually the tcp connection amount are balanced or distributed by least connection, least load, across all logstash nodes. It doesn't guarantee traffics/cpu load are balanced across all logstash nodes. According to my scenario, each tcp connection's traffic varies on daily average, over time, and it's event complexity. So in worse case, let's say, logstash_1 and logstash_2 both has 10 tcp connection, but logstash_1's cpu load might 3x more than logstash_2 since logstash_1's connection contains higher traffic, complexer event.

< Manual assign logstash-forwarders to logstash >

Might face the same situation as of TCP load balancer, since we can plan to distributing load based on historical daily average traffic, but it changed over time ,and no HA of course.

< message queue >

architecture as: service node with logstash-forwarder -> queuer: logstash to rabbitmq -> indexer: logstash from rabbitmq and to ElasticSearch

around 30% of CPU overhead on sending message to or receiving message from queue broker, for all nodes.

Best option to scale out is to have intermediate/broker/buffer that is able to efficiently recieve and send a lot of data. Logstash is made for processing messages stream, thats why is recommended to use redis/rabbitqm/sqs as a broker.This way you can independently tune the forwarder and indexer, since they are decoupled. — Frank, Sep 27 '14 at 07:05
I don't mind about logstash's cpu overhead in a clustered environment since I simply just another node to compensate the performance hit from logstash. — Frank, Sep 27 '14 at 07:11
Thanks for advise, I like intermediate broker approach, too. However, the extra CPU 30% thing across all nodes that consume/produce msg from rabbitMQ did lots of damage to resource constraint, which means I might reserve up to 1.5~2 times nodes to gain the same throughput as tcp load balancer. Where cost is my boss concern, too. — Jim Horng, Sep 27 '14 at 10:44
I had a lot of trouble with RMQ and ended up using redis instead. I found it incredibly efficient and easy to set up. — Andrew, Dec 16 '14 at 19:25

Paul Mooney · Answer 1 · 2015-10-23T15:31:06.657

I’ll focus on one aspect of your question; which is load-balancing a RabbitMQ cluster. RabbitMQ clusters always consist of a single Master node, and 0…n Slave nodes. It is therefore favourable to force connections to the Master node, rather than implement round-robin, leastconn, etc.

RabbitMQ will automatically route traffic directly to the Master node, even if your Load Balancer routes to a different node. This posts explains the conceptin greater detail.

Better way to scale out logstash and balance loading?

1 Answers1