How to suppress parquet log messages in Spark?

Question

How to stop such messages from coming on my spark-shell console.

5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 89213 records.
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 120141
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 89213
5 May, 2015 5:14:30 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutp
[Stage 12:=================================================>    (184 + 4) / 200]

Thanks

you mean ALL the messages coming from parquet or just the INFO ones (which seem to be the majority)? — Fabio Fantoni, May 05 '15 at 13:59
Can you change the log level in conf/log4j.properties and see what happened? — yjshen, May 05 '15 at 14:33
@YijieShen I still get the messages. Seems like it is not controlled from there. — user568109, May 06 '15 at 10:25
@Panto 1.4.0 SNAPSHOT, built from source taken from github. I have tried the same on 1.3.0. — user568109, May 06 '15 at 11:56
parquet Logging is a big issue still two years later with Spark 2.1.X. — WestCoastProjects, Jul 26 '17 at 00:55

score 4 · Answer 1 · answered Jul 13 '16 at 07:11

The solution from SPARK-8118 issue comment seem to work:

You can disable the chatty output by creating a properties file with these contents:

org.apache.parquet.handlers=java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level=SEVERE

And then passing the path of the file to Spark when the application is submitted. Assuming the file lives in /tmp/parquet.logging.properties (of course, that needs to be available on all worker nodes):

spark-submit \
     --conf spark.driver.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \`
      --conf spark.executor.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \
      ...

Credits go to Justin Bailey.

This worked for me after I added the line org.apache.parquet.hadoop.handlers=java.util.logging.ConsoleHandler to the properties file. — Javier, Jan 25 '17 at 20:33

score 3 · Answer 2 · answered May 29 '15 at 13:18

3

I believe this regressed --there are some large merges/changes they are making to the parquet integration...https://issues.apache.org/jira/browse/SPARK-4412

answered May 29 '15 at 13:18

Yana K.

1,926
4
19
27

Thanks for reopening the issue. – user568109 Jun 01 '15 at 07:36

score 3 · Answer 3 · answered Dec 14 '16 at 03:44

This will work for Spark 2.0. Edit file spark/log4j.properties and add:

log4j.logger.org.apache.spark.sql.execution.datasources.parquet=ERROR
log4j.logger.org.apache.spark.sql.execution.datasources.FileScanRDD=ERROR
log4j.logger.org.apache.hadoop.io.compress.CodecPool=ERROR

The lines for FileScanRDD and CodecPool will help with a couple of logs that are very verbose as well.

score 1 · Answer 4 · answered Aug 08 '17 at 15:29

I know this question was WRT Spark, but I recently had this issue when using Parquet with Hive in CDH 5.x and found a work-around. Details are here: https://issues.apache.org/jira/browse/SPARK-4412?focusedCommentId=16118403

Contents of my comment from that JIRA ticket below:

This is also an issue in the version of parquet distributed in CDH 5.x. In this case, I am using parquet-1.5.0-cdh5.8.4 (sources available here: http://archive.cloudera.com/cdh5/cdh/5)

However, I've found a work-around for mapreduce jobs submitted via Hive. I'm sure this can be adapted for use with Spark as well.

Add the following properties to your job's configuration (in my case, I added them to hive-site.xml since adding them to mapred-site.xml didn't work:
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
<property>
  <name>mapreduce.child.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
Create a file named parquet-logging.properties with the following contents:
# Note: I'm certain not every line here is necessary. I just added them to cover all possible
# class/facility names.you will want to tailor this as per your needs.
.level=WARNING
java.util.logging.ConsoleHandler.level=WARNING

parquet.handlers=java.util.logging.ConsoleHandler
parquet.hadoop.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.hadoop.handlers=java.util.logging.ConsoleHandler

parquet.level=WARNING
parquet.hadoop.level=WARNING
org.apache.parquet.level=WARNING
org.apache.parquet.hadoop.level=WARNING
Add the file to the job. In Hive, this is most easily done like so:
ADD FILE /path/to/parquet-logging.properties;
With this done, when you run your Hive queries, parquet should only log WARNING (and higher) level messages to the stdout container logs.

score 0 · Answer 5 · answered May 05 '15 at 14:36

0

To turn off all the messages except ERROR, you shoud edit your conf/log4j.properties file changing the following line:

log4j.rootCategory=INFO, console

into

log4j.rootCategory=ERROR, console

Hope it could help!

answered May 05 '15 at 14:36

Fabio Fantoni

3,077
3
22
32

3

This actually works for log4j logs from spark, but not for these. These parquet logs seems to be produced by something else? The log4j.rootCategory does not affect them. – borck May 05 '15 at 20:14

score 0 · Answer 6 · answered Jul 22 '15 at 11:24

not a solution but if you build your own spark then this file: https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java has most the generations of log messages which you can comment out for now.

How to suppress parquet log messages in Spark?

6 Answers6

Linked