Good day,
I am running a Flink (v1.7.1) streaming job on AWS EMR 5.20, and I would like to have all task_managers and job_manager's logs of my job in S3. Logback is used as recommended by the Flink team. As it is a long-running job, I want the logs to be:
- Copied to S3 periodically
- Rolling either on time or size or both (as there might be a huge amount of logs)
- Get cleaned from the internal disk of the EMR nodes (otherwise the disks will become full)
What I have tried are:
- Enabled logging to S3 when creating the EMR cluster
- Configured yarn rolling logs with: yarn.log-aggregation-enable, yarn.nodemanager.remote-app-log-dir, yarn.log-aggregation.retain-seconds, yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
- Configured rolling logs in logback.xml:
<appender name="ROLLING" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>${log.file}</file> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>%d{yyyy-MM-dd HH}.%i.log</fileNamePattern> <maxFileSize>30MB</maxFileSize> <maxHistory>3</maxHistory> <totalSizeCap>50MB</totalSizeCap> </rollingPolicy> <encoder> <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n</pattern> </encoder> </appender>
What I got/observed until now are:
- (1) did help with periodically copying the logs file to S3
- (2) seemed useless for me until now. Logs are only aggregated when the streaming job ended, and no rolling was observed.
- (3) yielded some result, but not close to requirements yet:
- the rolling logs are there in some cache folder (/mnt/yarn/usercache/hadoop/appcache/application_1549236419773_0002/container_1549236419773_0002_01_000002)
- only the last rolling logs file is available in the usual YARN logs folder (/mnt/var/log/hadoop-yarn/containers/application_1549236419773_0002/container_1549236419773_0002_01_000002)
- only the last rolling logs file is available in S3
In short, out of the 3 requirements I got, I could only either (1) or (2&3).
Could you please help me with this?
Thanks and best regards,
Averell