1

I have a spark stream that reads data from an azure data lake, applies some transformations then writes into the azure synapse (DW). I wanna log some metrics for each batch processed. but I don't wanna duplicate logs from each batch. Is there any way to log only once instead with some export_interval?

Example:

autoloader_df = (
    spark.readStream.format("cloudFiles")
    .options(**stream_config["cloud_files"])
    .option("recursiveFileLookup", True)
    .option("maxFilesPerTrigger", sdid_workload.max_files_agg)
    .option("pathGlobfilter", "*_new.parquet")
    .schema(stream_config["schema"])
    .load(stream_config["read_path"])
    .withColumn(stream_config["file_path_column"], input_file_name())
)
stream_query = (
    autoloader_df.writeStream.format("delta")
    .trigger(availableNow=True)
    .option("checkpointLocation", stream_config["checkpoint_location"])
    .foreachBatch(
        lambda df_batch, batch_id: ingestion_process(
            df_batch, batch_id, sdid_workload, stream_config, logger=logger
        )
    )
    .start()
)

Where ingestion process is as follows:

def ingestion_process(df_batch, batch_id, sdid_workload, stream_config, **kwargs):
    logger: AzureLogger = kwargs.get("logger")
    iteration_start_time = datetime.utcnow()
    sdid_workload.ingestion_iteration += 1
    general_transformations(sdid_workload)
    log_custom_metrics(sdid_workload)

`

In log_custom_metrics I'm using:

exporter = metrics_exporter.new_metrics_exporter(connection_string=appKey, export_interval=12)
view_manager.register_exporter(exporter)

I don’t want duplicated logs

1 Answers1

0

If anyone step by this post. I was able to find a workaround on this topic: https://github.com/census-instrumentation/opencensus-python/issues/1070

other related topics: https://github.com/census-instrumentation/opencensus-python/issues/1029 https://github.com/census-instrumentation/opencensus-python/issues/963

  • While the links may solve the problem, it's better to include the relevant information in the answer itself in case the links break – DeadChex Dec 30 '22 at 19:26