0

I am trying to retrieve blob on the basis of my filters for that I have created a device in iot-hub which is receiving telemetry data and routed it to the storage account as a blob. Now I want to retrieve the blob using Nodejs.

Is there any possibility where I can write an API which filters out me blobs on the basis of the filters without traversing the whole container of blobs?

1 Answers1

2

By default, Azure storage routing creates the blobs with the convention {iothub}/{partition}/{YYYY}/{MM}/{DD}/{HH}/{mm} inside the selected container. So, you have a predictable blob prefix which can be used in the query filter (more on that later). One thing to note here {partition} is the zero-indexed partition id of the partition message is ingested. For example, if you have chosen 4 partitions (default) while creating the IoT hub instance, partition IDs would be 0, 1, 2 and 3.

Now coming to the querying by filter part. Generally you would most likely want to list blobs (and further read the content) based on a time range as that is pretty much practical on your cold path analytics. Unfortunately, you won't be able to filter blobs by device id as same blob might contain messages from multiple devices. So with the assumption that your cold path analytics is going to process batch (most probably some continuous job) with a sliding time range, below is a sample query (over-simplified of course, read the inline comments carefully) using @azure/storage-blob package (v12 JavaScript SDK). You should check API reference for the improvisation need.

const blobServiceClient = BlobServiceClient.fromConnectionString('myStorageConnectionString');
const containerClient = blobServiceClient.getContainerClient('myContainer');

// Add logic here to select time range. 
// For simplicity I am selecting a hardcoded time range of 2020-1-1 5:45 pm to 2020-1-1 5:46 pm 
// (just 1 minute range)

// assuming IoT hub has 4 partitions
for (partition = 0; partition < 4; partition++) {
  // note below the prefix is picking up all blobs written at 17:45:00 to 17:45:59.xxx
  let iter = containerClient.listBlobsByHierarchy("/", { prefix: `myIotHub/${partition}/2020/01/01/17/45` });
  let entity = await iter.next();
  while (!entity.done) {
    let item = entity.value;
    if (item.kind === "prefix") {
      console.log(`\tBlobPrefix: ${item.name}`);
    } else {
      // At this point you might want to to read the blob content here. For simplicity I am just printing few blob properties below
      console.log(`\tBlobItem: name - ${item.name}, last modified - ${item.properties.lastModified}`);
    }
    entity = await iter.next();
  }
}

krishg
  • 5,935
  • 2
  • 12
  • 19
  • So It means there is no way of dynamically filter the data through using any API at server side?? – Hareem rehan Nov 25 '20 at 09:50
  • As traversing a whole lot of blobs is taking a lot of time and I am also not sure which blob it is i.e the relevant device data on which I'll be executing my functions. Secondly, there are multiple devices with nested hierarchy and on that basis, ill be fetching the device information but its very time-consuming in blobs. Is there any way I can directly fetch out the relevant data. – Hareem rehan Nov 25 '20 at 09:57
  • Blob storage is meant for storing files (unstructured data). If you want query analytics, either choose a different type of data store (like sql or no-sql) or use some analytics engine (like databricks, synapse etc.) on top of blob/datalake store. It's more architectural/design decision based on your requirement. – krishg Nov 25 '20 at 10:48
  • Is it achievable using device twin in iot-hub? – Hareem rehan Nov 30 '20 at 09:23
  • [Device twin](https://learn.microsoft.com/azure/iot-hub/iot-hub-devguide-device-twins) is a different thing than where you store your telemetry data. So problem remains the same. – krishg Nov 30 '20 at 09:44
  • 1
    Thank you soo much for answering my questions patiently. I am new on azure and still not sure of lots of features. I'll see the workaround of my problem but your answers helped me a lot. – Hareem rehan Nov 30 '20 at 10:53
  • One more thing, can I add a date range to fetch the blobs like here we have used let iter = containerClient.listBlobsByHierarchy("/", { prefix: `myIotHub/${partition}/2020/01/01/17/45` }); just a single date and time but what if I want to add a range ?? from 2020/00/01/12/17/59 -20/20/00/01/12/18/05 – Hareem rehan Dec 01 '20 at 12:58
  • The range can done by prefixing till hour, day, month..so on. So if you need from 2020-12-01 17:00 hour (1 hour duration), you can put prefix like myIotHub/${partition}/2020/12/01/17. But if need a range of minutes within an hour, that wont be possible. If you add the miniute part in prefix, it would give blobs from that minute only. – krishg Dec 01 '20 at 13:25
  • My requirement is I have a date-time picker from the client side which will send a range of date.The data would be between those ranges from 1st of Nov the till the 30th of Nov. Can I do that ?? and is there any function which can directly list down all the partitions of the container.. rather hardcoding it ? – Hareem rehan Dec 01 '20 at 13:37
  • As suggested earlier, for complex type of filtering, blob storage would not be a good idea. You should store data in some structured data store after massaging (analytics) based on your scenario. At least after knowing that you are thinking of exposing IoT telemetry data from blob to frontend, I am more convinced and afraid that you are not on right track. So I would not suggest stackoverflow forum anymore for this since it's more of a design of your application architecture and I am sorry this forum is not a right place for that. https://stackoverflow.com/help/on-topic. – krishg Dec 01 '20 at 15:48