Latency in Spark streaming job Databricks which sources from an Azure Iot Hub

Question

I have been using a Spark streaming job using Python on Databricks to load sources from an Azure IotHub. However I noticed, when we have a large number of received frames, the job comes long, so we have latency knowing that when we look at the metrics the CPU and memory are not used at 100% of their capacity.

score 2 · Answer 1 · answered Dec 28 '21 at 19:13

IoT Hub, similarly to EventHubs has its own throughput limits based on the provisioned capacity, so you can't read more than X MB/sec or N messages/sec.

Also, you need to remember that EventHubs connector maps EventHubs partitions 1:1 into Spark partitions, so if EventHubs/IoT Hub has fewer partitions than Spark cores, then not all cores are used. As alternative, you can consider using of Kafka connector to connect to EventHubs/IoT Hub as it allows to have more Spark partitions than partitions in EventHubs (see minPartitions option in Kafka connector)

Latency in Spark streaming job Databricks which sources from an Azure Iot Hub

1 Answers1