Im using Dataproc to fetch data from some BigQuery tables and I'm being inundated with log INFO
messages from what I think is the BigQuery connector. I want to shut these off unless i hit an error. For example this is what i get:
22/07/15 14:24:04 INFO DirectBigQueryRelation: Going to read from pricing-dev.markdown_spark_temp._sbc_ecdac0aeeebedbdfdc7 columns=[MDS_FAM_ID, GEO_REGION_CD, OP_CMPNY_CD, ITEM_NBR, ITEM_DESC_1, UPC_NBR, BASE_DIV_NBR, CNTRY_NM, DEPT_NBR, DEPT_DESC, MDSE_CATG_NBR, MDSE_CATG_DESC, MDSE_SUBCATG_NBR, MDSE_SUBCATG_DESC, SUBCLASS_NBR, SUBCLASS_DESC, FINELINE_NBR, FINELINE_DESC, ACCTG_DEPT_NBR, ACCTG_DEPT_DESC, DEPT_SUBCATG_NBR, DEPT_SUBCATG_DESC, DEPT_C
They typically come from two sources DirectBigQueryRelation
and BigQueryUtilScala
. So far, this is what I have tried using based on some other questions similar to this:
class PipelineLogger():
def __init__(self, spark_session: SparkSession):
self.spark_session = spark_session
log4j = spark_session._jvm.org.apache.log4j
self.log_manager = log4j.LogManager
self.log_manager.getLogger('BigQueryUtilScala').setLevel(self.log_manager.Level.WARN)
self.log_manager.getLogger('DirectBigQueryRelation').setLevel(self.log_manager.Level.WARN)
self.logger = self.log_manager.getLogger(__name__)
self.info(f"Logger Initialized for App: {spark_session.sparkContext.getConf().get('spark.app.name')}")
However I'm still getting those INFO messages. Keep in mind that I use the self.logger
to log messages in a particular way for the rest of my pyspark code that does not have anything to do with BQ.
Any help would be appreciated.