The question originated from: https://groups.google.com/forum/#!topic/logstash-users/cYv8ULhHeE0
By comparing below logstash scale out strategies, tcp load balancer has best performance if traffics/cpu load are balanced. However, it seems hard to balance traffic all the time due to nature of logstash-forwarder <-> logstash tcp connections. Anyone got better idea to make traffic/cpu load more balanced across logstash nodes? Thanks for advise :)
< My scenario >
- 10+ service node equipped with logstash-forwarder to forward logs to central logstash node(cluster)
- each service node's log average throughput, throughput daily distribution, log type's filter complexity varies a lot
- log average throughput: e.g. service_1: 0.5k event/s; service_2: 5k event/s
- throughput daily distribution: e.g. service_1 peak at morning, service_2's peak at night
- log type's filter complexity: by consuming 100% single logstash node's CPU, service_1's log type can be processed at 300 event/s, while service_2's log type is 1500 event/s
< TCP load balancer >
Since tcp connection are persistence between logstash-forwarder and logstash, which means, whether eventually the tcp connection amount are balanced or distributed by least connection, least load, across all logstash nodes. It doesn't guarantee traffics/cpu load are balanced across all logstash nodes. According to my scenario, each tcp connection's traffic varies on daily average, over time, and it's event complexity. So in worse case, let's say, logstash_1 and logstash_2 both has 10 tcp connection, but logstash_1's cpu load might 3x more than logstash_2 since logstash_1's connection contains higher traffic, complexer event.
< Manual assign logstash-forwarders to logstash >
Might face the same situation as of TCP load balancer, since we can plan to distributing load based on historical daily average traffic, but it changed over time ,and no HA of course.
< message queue >
architecture as: service node with logstash-forwarder -> queuer: logstash to rabbitmq -> indexer: logstash from rabbitmq and to ElasticSearch
- around 30% of CPU overhead on sending message to or receiving message from queue broker, for all nodes.