2

Im using Dataproc to fetch data from some BigQuery tables and I'm being inundated with log INFO messages from what I think is the BigQuery connector. I want to shut these off unless i hit an error. For example this is what i get:

22/07/15 14:24:04 INFO DirectBigQueryRelation: Going to read from pricing-dev.markdown_spark_temp._sbc_ecdac0aeeebedbdfdc7 columns=[MDS_FAM_ID, GEO_REGION_CD, OP_CMPNY_CD, ITEM_NBR, ITEM_DESC_1, UPC_NBR, BASE_DIV_NBR, CNTRY_NM, DEPT_NBR, DEPT_DESC, MDSE_CATG_NBR, MDSE_CATG_DESC, MDSE_SUBCATG_NBR, MDSE_SUBCATG_DESC, SUBCLASS_NBR, SUBCLASS_DESC, FINELINE_NBR, FINELINE_DESC, ACCTG_DEPT_NBR, ACCTG_DEPT_DESC, DEPT_SUBCATG_NBR, DEPT_SUBCATG_DESC, DEPT_C

They typically come from two sources DirectBigQueryRelation and BigQueryUtilScala. So far, this is what I have tried using based on some other questions similar to this:

class PipelineLogger():
    def __init__(self, spark_session: SparkSession):
        self.spark_session = spark_session
        log4j = spark_session._jvm.org.apache.log4j
        self.log_manager = log4j.LogManager
        self.log_manager.getLogger('BigQueryUtilScala').setLevel(self.log_manager.Level.WARN)
        self.log_manager.getLogger('DirectBigQueryRelation').setLevel(self.log_manager.Level.WARN)
        self.logger = self.log_manager.getLogger(__name__)
        self.info(f"Logger Initialized for App: {spark_session.sparkContext.getConf().get('spark.app.name')}")

However I'm still getting those INFO messages. Keep in mind that I use the self.logger to log messages in a particular way for the rest of my pyspark code that does not have anything to do with BQ.

Any help would be appreciated.

Frank Pinto
  • 134
  • 12

1 Answers1

0

What you’ll need to do is to use an exclusion filter. For this you need to browse from your console to Stackdriver Logging > Logs ingestion > Exclusions and click on "Create exclusion". After you made an exclusion, the matched logs will no longer be accessible by Stackdriver Logging.

The exclusion should look like:

resource.type="cloud_dataproc_cluster"
textPayload:"INFO DirectBigQueryRelation: Going to read from pricing-dev.markdown_spark_temp._sbc_ecdac0aeeebedbdfdc7 columns=[...]"
Jose Gutierrez Paliza
  • 1,373
  • 1
  • 5
  • 12