1

I have been using a Spark streaming job using Python on Databricks to load sources from an Azure IotHub. However I noticed, when we have a large number of received frames, the job comes long, so we have latency knowing that when we look at the metrics the CPU and memory are not used at 100% of their capacity.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132

1 Answers1

2

IoT Hub, similarly to EventHubs has its own throughput limits based on the provisioned capacity, so you can't read more than X MB/sec or N messages/sec.

Also, you need to remember that EventHubs connector maps EventHubs partitions 1:1 into Spark partitions, so if EventHubs/IoT Hub has fewer partitions than Spark cores, then not all cores are used. As alternative, you can consider using of Kafka connector to connect to EventHubs/IoT Hub as it allows to have more Spark partitions than partitions in EventHubs (see minPartitions option in Kafka connector)

Alex Ott
  • 80,552
  • 8
  • 87
  • 132