ELK apache spark application log

Question

How to configure Filebeats to read apache spark application log. The logs generated is moved to history server, in non readable format as soon as the application is completed. What is the ideal way here.

score 2 · Answer 1 · answered May 06 '19 at 14:48

You can configure Spark logging via Log4J. For a discussion around some edge cases for setting up log4j configuration, see SPARK-16784, but if you simply want to collect all application logs coming off a cluster (vs logs per job) you shouldn't need to consider any of that.

On the ELK side, there was a log4j input plugin for logstash, but it is deprecated.

Thankfully, the documentation for the deprecated plugin describes how to configure log4j to write data locally for FileBeat, and how to set up FileBeat to consume this data and sent it to a Logstash instance. This is now the recommended way to ship logs from systems using log4j.

So in summary, the recommended way to get logs from Spark into ELK is:

Set the Log4J configuration for your Spark cluster to write to local files
Run FileBeat to consume from these files and sent to logstash
Logstash will send data into Elastisearch
You can search through your indexed log data using Kibana

ELK apache spark application log

1 Answers1