0

I use Spark Structured Streaming (pyspark) to read data from Kafka topic. It works well but when I open executors stderr my whole log page is WARN from Kafka saying that kafkadataconsumer is not running in uninterruptiblethread. it may hang when kafkadataconsumer's methods are interrupted because of kafka-1894. How can I disable this warning or maybe fix consumer?

Spark: 3.1.1 with org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2

I tried several options but afaik kafka consumer doesn't know that it runs within some spark application so it is useless trying to set sparkContext.setLogLevel. The most recent try was with something like this:

logger = spark._jvm.org.apache.log4j
logger.LogManager.getLogger("org.apache.kafka").setLevel(logger.Level.ERROR)

But it doesn't work :(

P.S. Yeah, I know that it is just warning and warning is not an error, but I think one executor generates a nearly 2k rows per second of these warnings so you can't find a useful prints. You either scroll for a really long time or waiting for log file to be loaded. So its kinda frustrating

Huvi
  • 63
  • 2
  • 7

1 Answers1

0

Solved this by adding two lines at the end of default log4j.properties file

log4j.logger.org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer=ERROR
log4j.additivity.org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer=false

Then just add this file to my application and set this spark config variable --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///some_path/log4j.properties"

Huvi
  • 63
  • 2
  • 7