0

I'm running a spark job on EMR, (yarn, cluster-mode, transient - the cluster shuts down after the job is done) with debug mode turned on. all spark logs are uploaded to s3 as expected but I can't upload my own custom logs... Using log4j, I'm trying to write them to the folowing path acording to the spark doc log4j.appender.algoLog.File=${spark.yarn.app.container.log.dir}/algoLog.log

It seems like the variable is undefined. It tries to write directly to root. /algoLog.log. If I'm writing it to other arbitrary location. It just doesn't appear on s3. where should I write my own log files if I want EMR to upload them to s3 after the cluster shut down?

NetanelRabinowitz
  • 1,534
  • 2
  • 14
  • 26

1 Answers1

0

Log4J isn't set up to write to object stores; it's notion of filesystem is different.

you may be able to get YARN to do it with its log collection. See How to keep YARN's log files?

stevel
  • 12,567
  • 1
  • 39
  • 50
  • I didn't try to give log4j s3 path. I tried to give it local path and I'm expecting that EMR will deliver those files into it's s3 logging bucket. – NetanelRabinowitz Jul 10 '17 at 11:19
  • that's not anything I've played with; I do know YARN logging is designed to collect data from across the cluster and serve it up for viewing...though even there I don't know about s3 integration. sorry – stevel Jul 11 '17 at 10:30