3

I have a flink job which uses logback as the logging framework since the logs needs to be sent to logstash and logback has a logstash appender (logstash-logback-appender). The appender works fine and i can see the application logs in logstash when the flink job is run from an IDE like Eclipse. The logging configuration file logback.xml is placed in src/main/resources and gets included on the classpath. The logging works fine even when running the job from command line outside the IDE.

However when i deploy this job on flink cluster(standalone, started using ./start-cluster.bat) through the flink dashboard, the logback configuration is ignored and the logs are not sent to logstash.

I read up more about flink's logging mechanism and came across documentation on configuring logback. The steps mentioned in this documentation works fine with some additional steps like adding logstash-logback-encoder lib in the lib/ folder along with logback jars.

Even though the steps mentioned above works this is problematic since the logback configuration in flink/conf folder which is used by flink, applies to the entire flink setup and all the jobs running on flink. The jobs cannot have their own logging configuration. For eg. i want job1 to write to file,console,logstash and job 2 to write to only file.

How can each flink job that is started from the dashboard be supplied with seperate logging configuration? Is there any way logging configuration can be passed while submitting the job on dashboard?

Is there someway to force flink to use logging configuration on the classpath?

tweeper
  • 352
  • 1
  • 4
  • 16

1 Answers1

3

Flink currently does not support to specify individual logging configurations per job. The logging configuration is always valid for the whole cluster.

A way to solve this problem is to start the jobs in per-job mode. This means that you start for every Flink job a dedicated Flink cluster.

bin/flink run -m yarn-cluster -p 2 MyJobJar.jar
Till Rohrmann
  • 13,148
  • 1
  • 25
  • 51
  • is this a limitation in flink or is it by design? – tweeper Nov 29 '18 at 10:01
  • 3
    It is a bit of a fundamental problem: There are components like the `TaskExecutor` which are already started before a job has been submitted. Moreover these components can run `Tasks` from multiple jobs. For these components, you need to define upfront the logging configuration because they should log for every job. – Till Rohrmann Nov 29 '18 at 10:22
  • Thanks for the details. We would sometimes need logging configuration per job without having dedicated clusters per job. Going by your comment it looks like having such a feature (per job logging configuration) wouldn't be easy. Do you think this is something that can be addressed in flink going forward? – tweeper Nov 29 '18 at 10:33
  • 1
    If you are interested in only logging the user code statements based on the job, it could be possible by using Flink's child-first class-loading `classloader.resolve-order: child-first` and then removing `ch.qos.logback` from `classloader.parent-first-patterns.default`. That way you would load logback from your user code class loader which should pick up the configuration files from the user code classpath. But I haven't tried it out. – Till Rohrmann Nov 29 '18 at 10:53
  • I tried the steps you mentioned. This doesn't seem to work. As per documentation `child-first` is the default setting. So flink should have loaded dependencies from user code first anyways. Also, how would flink know that it needs to load the configuration(_logback.xml_) from application classpath? The documentation mentions only about dependencies. – tweeper Dec 03 '18 at 07:04
  • 1
    You need to overwrite the `classloader.parent-first-patterns.default` value because the default for this option excludes the logback implementation from being loaded from the user code first. Loading the configuration depends then on the actual implementation of logback. That you need to check. – Till Rohrmann Dec 03 '18 at 07:07
  • I tried setting the following properties in _flink-conf.yaml_ : `classloader.resolve-order: child-first` `classloader.parent-first-patterns.default: java.;scala.;org.apache.flink.;com.esotericsoftware.kryo;org.apache.hadoop.;javax.annotation.;org.slf4j;org.apache.log4j;org.apache.logging;org.apache.commons.logging; Are these the right settings? i also tried removing org.slf4j;org.apache.log4j;org.apache.logging;org.apache.commons.logging` from `classloader.parent-first-patterns.default` but no luck. Note: Flink version 1.6.1 – tweeper Dec 04 '18 at 04:42