I have gone through the following questions and pages seeking an answer for my problem, but they did not solve my problem:
Logger is not working inside spark UDF on cluster
https://www.javacodegeeks.com/2016/03/log-apache-spark.html
We are using Spark in standalone mode, not on Yarn. I have configured the log4j.properties file in both the driver and executors to define a custom logger "myLogger". The log4j.properties file, which I have replicated in both the driver and the executors, is as follows:
log4j.rootLogger=INFO, Console_Appender, File_Appender
log4j.appender.Console_Appender=org.apache.log4j.ConsoleAppender
log4j.appender.Console_Appender.Threshold=INFO
log4j.appender.Console_Appender.Target=System.out
log4j.appender.Console_Appender.layout=org.apache.log4j.PatternLayout
log4j.appender.Console_Appender.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
log4j.appender.File_Appender=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.File_Appender.Threshold=INFO
log4j.appender.File_Appender.File=/opt/spark_log/app_log.txt
log4j.appender.File_Appender.RollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.File_Appender.TriggeringPolicy=org.apache.log4j.rolling.SizeBasedTriggeringPolicy
log4j.appender.File_Appender.RollingPolicy.FileNamePattern=/opt/spark_log/app_log.%d{MM-dd-yyyy}.%i.txt.gz
log4j.appender.File_Appender.RollingPolicy.ActiveFileName=/opt/spark_log/app_log.txt
log4j.appender.File_Appender.TriggeringPolicy.MaxFileSize=1000
log4j.appender.File_Appender.layout=org.apache.log4j.PatternLayout
log4j.appender.File_Appender.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c - %m%n
log4j.logger.myLogger=INFO,File_Appender
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
In my Java application, I have placed loggers using the following line:
private static Logger logger = LogManager.getLogger("myLogger");
I am running the application using the following command:
spark-submit --driver-java-options "-Dlog4j.configuration=file:///opt/spark/spark-2.4.4/conf/log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/spark-2.4.4/conf/log4j.properties" --class com.test.SparkApp file:///opt/test/cepbck/test.person.app-0.0.7.jar
When I run the application on a cluster, the logs in the main driver class are appearing fine in both the console and the log file. But when the control is going inside an UDF, no logs are being printed. I am opening the log file on the executors, but they also are not containing any of the log statements which I gave. Please help me in this regard.