8

I am using python logging with pyspark and pyspark DEBUG level messages are flooding my log file with the example shown. How do I prevent this from happening? A simple solution is to set log level to INFO, but I need to log my own python DEBUG level messages

2015-12-13 15:13:32 4906 DEBUG   : Command to send: j
i
rj
org.apache.spark.SparkConf
e

2015-12-13 15:13:32 4906 DEBUG   : Answer received: yv
2015-12-13 15:13:32 4906 DEBUG   : Command to send: j
i
rj
org.apache.spark.api.java.*
e

2015-12-13 15:13:32 4906 DEBUG   : Answer received: yv
2015-12-13 15:13:32 4906 DEBUG   : Command to send: j
i
rj
org.apache.spark.api.python.*
e
Michael
  • 1,398
  • 5
  • 24
  • 40

5 Answers5

6

You can set logging level for each logger separately

 pyspark_log = logging.getLogger('pyspark')
 pyspark_log.setLevel(logging.ERROR)
Andy
  • 161
  • 3
3

I had the same issue, I used following and all worked fine.

pyspark_log = logging.getLogger('pyspark').setLevel(logging.ERROR)
py4j_logger = logging.getLogger("py4j").setLevel(logging.ERROR)
matplotlib_logger = logging.getLogger("matplotlib").setLevel(logging.ERROR)

I was getting some matplotlib lib logs also, so I changed matplotlib logger level too, but if you don't have that issue you can remove that line.

H S Rathore
  • 1,954
  • 2
  • 15
  • 20
3

The key component is "py4j". You just need to add a line of code to the beginning of your program:

py4j_logger = logging.getLogger("py4j").setLevel(logging.INFO)

Or just:

logging.getLogger("py4j").setLevel(logging.INFO)
Z.Wei
  • 3,658
  • 2
  • 17
  • 28
1
logging.basicConfig(level=logging.DEBUG)
logging.getLogger('py4j').setLevel(logging.INFO) # use setLevel(logging.ERROR) is also fine
logging.getLogger('pyspark')

logging.info('Task is successful.')
Rong Du
  • 291
  • 4
  • 3
0

The best way to control pyspark and py4j logging is by setting the following snippet:

import logging
logging.getLogger("py4j").setLevel(<pyspark-level>)
logging.getLogger('pyspark').setLevel(<py4j-level>)
logger = logging.getLogger('pyspark')

For your case you should write:

import logging
logging.getLogger("py4j").setLevel(logging.DEBUG)
logging.getLogger('pyspark').setLevel(logging.WARNING)
logger = logging.getLogger('pyspark')
mangelfdz
  • 306
  • 2
  • 7