12

I am using structured streaming to read data from Kafka and create various aggregate metrics. I have enabled Graphite sink using metrics.properties. I have seen applications in older Spark version have streaming related metrics. I don't see streaming related metrics with Structured streaming. What am I doing wrong?

For example - Not able to find Unprocessed Batches or running batches or last completed batch total delay.

I have enabled streaming metrics by setting:

SparkSession.builder().config("spark.sql.streaming.metricsEnabled",true)

Even then I am getting only 3 metrics:

  • driver.spark.streaming.inputrate
  • driver.spark.streaming.latency
  • driver.spark.streaming.processingrate

These metrics have gaps in between them. Also it starts showing up really late after the application is started. How do I get extensive streaming related metrics to grafana?

I checked StreamingQueryProgress. We can only programmatically creating custom metrics using this one. Is there a way I can consume the metrics which Spark streaming already sends to the sink that I mention?

zero323
  • 322,348
  • 103
  • 959
  • 935
passionate
  • 503
  • 2
  • 7
  • 25

1 Answers1

2

You can still find some of those metrics. The query which actually starts the streaming harness has two methods - lastProgress and recentProgress

They expose details like number of rows processed, duration of the batch, number of input rows in the batch among other things. There is also a method within called json that can get all this information in a single go which can probably be used for sending to some metrics collector.

Community
  • 1
  • 1
shashydhar
  • 801
  • 3
  • 8
  • 26
  • Do I find it in graphite where the metrics are sent? I can't find it even after setting spark.sql.streaming.metricsEnabled as true. How do I make sure additional metrics are also sent like the ones you mentioned? – passionate Apr 22 '18 at 18:51
  • query.sparkSession.streams.addListener(new StreamingQueryListener() { override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = { logger.info("Query started: " + queryStarted.id+" for QUERY NAME" +query.name) } override def onQueryTerminated(queryTerminated: QueryTerminatedEvent): Unit = { logger.info("Query terminated: " + queryTerminated.id+" for QUERY NAME" +query.name) } } logger.info("recentProgress"+ query.recentProgress) logger.info("progress"+ query.lastProgress) – Ajith Kannan Oct 01 '18 at 16:18