I know this question was WRT Spark, but I recently had this issue when using Parquet with Hive in CDH 5.x and found a work-around. Details are here: https://issues.apache.org/jira/browse/SPARK-4412?focusedCommentId=16118403
Contents of my comment from that JIRA ticket below:
This is also an issue in the version of parquet distributed in CDH
5.x. In this case, I am using parquet-1.5.0-cdh5.8.4
(sources available here: http://archive.cloudera.com/cdh5/cdh/5)
However, I've found a work-around for mapreduce jobs submitted via
Hive. I'm sure this can be adapted for use with Spark as well.
- Add the following properties to your job's configuration (in my case, I added them to
hive-site.xml
since adding them to
mapred-site.xml
didn't work:
<property>
<name>mapreduce.map.java.opts</name>
<value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
<property>
<name>mapreduce.child.java.opts</name>
<value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
- Create a file named
parquet-logging.properties
with the following contents:
# Note: I'm certain not every line here is necessary. I just added them to cover all possible
# class/facility names.you will want to tailor this as per your needs.
.level=WARNING
java.util.logging.ConsoleHandler.level=WARNING
parquet.handlers=java.util.logging.ConsoleHandler
parquet.hadoop.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.hadoop.handlers=java.util.logging.ConsoleHandler
parquet.level=WARNING
parquet.hadoop.level=WARNING
org.apache.parquet.level=WARNING
org.apache.parquet.hadoop.level=WARNING
With this done, when you run your Hive queries, parquet should only
log WARNING (and higher) level messages to the stdout container logs.