2

I'm using org.apache.parquet in a java program that converts json files to parquet format. However no matter what I try I can't disable parquet's own logging to stdout. Is there any way to alter the parquet logging level, or turn it off completely?

Example of log messages on stdout...

12-Feb-2017 18:12:21 INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 427B for [verb] BINARY: 2,890 values, 385B raw, 390B comp, 1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 2 entries, 17B raw, 2B comp}
12-Feb-2017 18:12:21 INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 3,256B for [postedTime] BINARY: 2,890 values, 3,585B raw, 3,180B comp, 1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 593 entries, 16,604B raw, 593B comp}
12-Feb-2017 18:12:21 INFO: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 4,611B for [message] BINARY: 2,890 values, 4,351B raw, 4,356B comp, 1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 2,088 entries, 263,329B raw, 2,088B comp}

Example of how I call parquet...

public void writeToParquet(List<GenericData.Record> recordsToWrite, Path fileToWrite) throws IOException {
    try (ParquetWriter<GenericData.Record> writer = AvroParquetWriter
            .<GenericData.Record>builder(fileToWrite)
            .withSchema(SCHEMA)
            .withConf(new Configuration())
            .withCompressionCodec(CompressionCodecName.SNAPPY)
            .build()) {

        for (GenericData.Record record : recordsToWrite) {
            writer.write(record);
        }
    }
}
user3188040
  • 671
  • 9
  • 24

1 Answers1

0

I know this is an old question, but I've just encountered this issue when using Parquet with Hive in CDH 5.x and found a workaround. See here: https://stackoverflow.com/a/45572400/14186

Perhaps others will find it useful.

Hercynium
  • 929
  • 9
  • 18