1

I am using MapR5.2 - Spark version 2.1.0 And i am running my spark app jar in Yarn CLuster mode.

I have tried all the available options that i found But unable to succeed.

This is our Production environment. But i need that for my particular spark job it should follow and pick-up my log4j-Driver.properties file which is present in my src/main/resources folder(I also confirmed by opening the jar my property file is present)

1) Content of My Log File -> log4j-Driver.properties

log4j.rootCategory=DEBUG, FILE
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n    
log4j.appender.FILE=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.File=/users/myuser/logs/Myapp.log
log4j.appender.FILE.ImmediateFlush=true
log4j.appender.FILE.Threshold=debug
log4j.appender.FILE.Append=true
log4j.appender.FILE.MaxFileSize=100MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

2) My Script for Spark-Submit Command

propertyFile=application.properties
spark-submit --class MyAppDriver \
--conf "spark.driver.extraJavaOptions -Dlog4j.configuration=file:/users/myuser/log4j-Driver.properties" \
--master yarn --deploy-mode cluster \
--files /users/myuser/log4j-Driver.properties,/opt/mapr/spark/spark-2.1.0/conf/hive-site.xml,/users/myuser/application.properties 
/users/myuser/myapp_2.11-1.0.jar $propertyFile

All i Need is as of now i am trying to Write my Driver Logs in the directory mentioned in my properties file(mentioned above) If i am successful in this then i will try for Executor logs as well. But first i need to make this Driver Log to write on my local (and its an Edge node of our Cluster)

AJm
  • 993
  • 2
  • 20
  • 39

1 Answers1

4

/users/myuser/log4j-Driver.properties seems to be the path to the file on your local computer so you were right to use it for the --files argument.

The problem is, that there's no such file on the driver and/or executor, so when you use file:/users/myuser/log4j-Driver.properties as an argument to -Dlog4j.configuration Log4j will fail to find it.

Since you run on YARN, files listed as arguments to --files will be submitted to HDFS. Each application submission will have its own base directory in HDFS where all the files will be put by spark-submit.

In order to refer to these files use relative paths. In your case --conf "spark.driver.extraJavaOptions -Dlog4j.configuration=log4j-Driver.properties" should work.

Ihor Kaharlichenko
  • 5,944
  • 1
  • 26
  • 32
  • Thanks for Responding. I tried the way u mentioned but i didn't see any file getting created here -> log4j.appender.FILE.File=/users/myuser/logs/Myapp.log This is the line from my log4j-Driver.properties file where i am specifying the directory the log file name – AJm Dec 17 '17 at 23:49
  • And again: the file you mentioned is supposed to be created on the driver. Check there's a directory named `/users/myuser/logs` on the driver host. Also make sure that the user used to run Spark driver has permissions to write to this directory. – Ihor Kaharlichenko Dec 18 '17 at 07:44
  • What i am thinking is. As i am submitting as YARN Cluster mode so as per this --> https://www.cloudera.com/documentation/enterprise/5-6-x/images/xspark-yarn-cluster.png.pagespeed.ic.f4CfMwda2i.webp As i am submitting from my edge node which becomes the Client, The Driver sits on any of the 1 node of the Cluster (excluding the Node from where the Client submitted application) and that 1 node on which the Driver instance is started does not have my log path/directory created. And we cannot know on which node the Driver instance will start. Let me know what i am thinking is correct ?? – AJm Dec 18 '17 at 16:48
  • The picture you linked to describes the architecture from the YARN's point of view. Since you're running with `--deploy-mode cluster` both Driver and Executor(s) are run within YARN Container. Each container is allocated by YARN Manager according to its own internal logic which you cannot influence directly. Thus the Driver/Executor code can be run on any physical/virtual machine that your cluster is comprised of. So your `/users/myuser/logs` may or may not be present on that machine. – Ihor Kaharlichenko Dec 18 '17 at 17:53
  • This is what i am trying to explain in my above comment. So am i thinking in the Right direction .? – AJm Dec 18 '17 at 19:40