2

I am working on an IoT analytics solution which consumes Avro formatted messages fired at an Azure IoT Hub and (hopefully) uses Stream Analytics to store messages in Data Lake and blob storage. A key requirement is the Avro containers must appear exactly the same in storage as they did when presented to the IoT Hub, for the benefit of downstream consumers.

I am running into a limitation in Stream Analytics with granular control over individual file creation. When setting up a new output stream path, I can only provide date/day and hour in the path prefix, resulting in one file for every hour instead of one file for every message received. The customer requires separate blob containers for each device and separate blobs for each event. Similarly, the Data Lake requirement dictates at least a sane naming convention that is delineated by device, with separate files for each event ingested.

Has anyone successfully configured Stream Analytics to create a new file every time it pops a message off of the input? Is this a hard product limitation?

Tom Sun - MSFT
  • 24,161
  • 3
  • 30
  • 47
Pete M
  • 2,008
  • 11
  • 17

1 Answers1

1

Stream Analytics is indeed oriented for efficient processing of large streams. For your use case, you need an additional component to implement your custom logic.

Stream Analytics can output to Blob, Event Hub, Table Store or Service Bus. Another option is to use the new Iot Hub Routes to route directly to an Event Hub or a Service Bus Queue or Topic.

From there you can write an Azure Function (or, from Blob or Table Storage, a custom Data Factory activity) and use the Data Lake Store SDK to write files with the logic that you need.

  • Thanks Alexandre, this is effectively what we ended up going with. We are already using routes to break out overarching categories of messages based on a type property but this would have been a bit excessive with millions of targets. The customer was willing to bend on the initial ingestion so we are going straight to storage, then leveraging ADF. – Pete M Jan 03 '17 at 04:55
  • @PeteM did you able to achieve the need? We are also looking for similar solution. can u help with the steps followed. – kudlatiger Jan 23 '19 at 09:27
  • 1
    @kudlatiger IIRC for that solution we allowed ASA to stream hourly files into blob storage. A scheduled ADF job picks it up, reads the whole file and outputs into Data Lake via USQL with a csv extractor. Since it's reading and writing on a per-message basis the USQL has access to use the device id or anything else we want when determining the output path for that record. – Pete M Feb 07 '19 at 19:03