How to configure Filebeats to read apache spark application log. The logs generated is moved to history server, in non readable format as soon as the application is completed. What is the ideal way here.
Asked
Active
Viewed 3,551 times
1 Answers
2
You can configure Spark logging via Log4J. For a discussion around some edge cases for setting up log4j configuration, see SPARK-16784, but if you simply want to collect all application logs coming off a cluster (vs logs per job) you shouldn't need to consider any of that.
On the ELK side, there was a log4j input plugin for logstash, but it is deprecated.
Thankfully, the documentation for the deprecated plugin describes how to configure log4j to write data locally for FileBeat, and how to set up FileBeat to consume this data and sent it to a Logstash instance. This is now the recommended way to ship logs from systems using log4j.
So in summary, the recommended way to get logs from Spark into ELK is:
- Set the Log4J configuration for your Spark cluster to write to local files
- Run FileBeat to consume from these files and sent to logstash
- Logstash will send data into Elastisearch
- You can search through your indexed log data using Kibana

turtlemonvh
- 9,149
- 6
- 47
- 53