Logstash architecture decisions

Question

So we have a bunch of servers running on EC2 Amazon Web Services, and are looking to set up logstash/elasticsearch for distributed logging.

From what I read there are several options generally chosen:

logstash on each server node, using the File input filter and going directly to ElasticSearch cluster as an output filter
logstash on each server node, using the logstash forwarder, connecting to a logstash on the ElasticSearch cluster, which forwards it to ElasticSearch as an output filter
logstash on each server node, using the File input filter and using Redis as a queue. Then a logstash on each ElasticSearch node picking up from redis and passing to ElasticSearch.

There are also variants using AsyncAppender (which has a not so good reputation).

I am tempted to choose #1, particular since we are using a patternLayout that automatically converts to JSON. So we'll save extra files with the JSON on each server node, and have a File input send directly to ElasticSearch.

What are the negatives of this? Why is a queue/broker often recommended?

It does seem like File input filter by itself is not so robust when unable to connect to elasticsearch? Is this a primary reason for the queue? — MJB, Feb 13 '15 at 02:24
We went with option #2. I don't like the idea of running JVM's on all our servers for one thing. Has worked well for us. Nice having your filters defined just once too. — ficuscr, Feb 13 '15 at 19:37
Fair enough - but since all our app servers are java anyway, not an issue for us ;) — MJB, Feb 13 '15 at 19:49

score 0 · Accepted Answer · answered Feb 21 '15 at 03:03

Here are some issues with your scenarios:

1: Must have JVM on each machine, with associated memory footprint and maintenance issues. Since they're writing straight to elasticsearch, your filters have to be distributed to each machine.

3: Still that JVM on each server, plus the extra redis step.

Just because your app requires the JVM isn't that great of a reason to pile more stuff on it. This is especially true in AWS, where that bill comes in every month...

Note that logstash and logstash-forwarder will both back off when logstash is busy, so you don't need a broker like redis in this environment (as long as you can get logstash running before your log files rotate).

If you can, run logstash-forwarder on the servers, sending their output to a centalized logstash server and then on to elasticsearch. That's basically your #2 option.

Logstash architecture decisions

1 Answers1