4

I am trying to add about 6000 messages to my Azure Storage Queue in an Azure Function with Node.js.

I have tried multiple ways to do this, right now I wrap the QueueService method in a Promise and resolve the 6000 promises through a Promise.map with a concurrency of about 50 using Bluebird.

const addMessages = Promise.map(messages, (msg) => {
  //returns a promise wrapping the Azure QueueService method
  return myQueueService.addMessage(msg);
}, { concurrency: 50 });

//this returns a promise that resolves when all promises have resolved.
//it rejects when one of the promises have rejected.
addMessages.then((results) => {
  console.log("SUCCESS");
}, (error) => {
  console.log("ERROR");
});

My QueueService is created with an ExponentialRetry policy.


I have had mixed results using this strategy:

  • All messages get added to my queue and the promise resolves correctly.
  • All messages get added to my queue and the promise does not resolve (or reject).
  • Not all messages get added to my queue and the promise does not resolve (or reject).

Am I missing something or is it possible for my calls to sometimes take 2 minutes to resolve and sometimes more than 10 minutes?

In the future, I probably am going to have to add about 100.000 messages, so I'm kind of worried about the unpredictable result I have now.

What would be the best strategy to add a large number of messages in Node (in an Azure Function)?


EDIT:

Not sure how I missed this, but a pretty reliable way to add my messages to my Storage Queue is to use the queue output binding of my Azure Function:

https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue#storage-queue-output-binding

Makes my code a lot easier as well!

for (var i = 0; i < messages.length; i++) {
  // add each message to queue
  context.bindings.outputQueue.push(messages[i]);
}

EDIT2:

I am going to split up my messages in batches of about 1000 and store these batches in Azure Blob Storage.

Another Azure Function can be triggered each time a new blob is added and this function will handle the queueing of my messages by 1000 at a time.

This should make my queueing much more reliable and scalable, as I tried adding 20.000 messages to my queue through my output binding and recieving an Azure Function timeout after 5 minutes being able to process only about 15.000 messages.

Schaemelhout
  • 685
  • 9
  • 25
  • I'm having the same issues. Did your last approach work for you? I like the idea and might try it. What about multithreading the add operations as well? – Brad Firesheets Oct 06 '17 at 19:02

2 Answers2

2

What triggers this function? What I would recommend, instead of having a single function add all of those messages, is to fan out and allow those functions to scale and take better advantage of concurrency by limiting the amount of work they're doing.

With I'm proposing above, you'd have the function that handles the trigger you have in place today queue up the work that would in turn be processed by another function that performs the actual work of adding a (much) smaller number of messages to the queue. You may need to play with the numbers to see what works well based on your workload, but this pattern would allow those functions to better scale (including across multiple machines), better handle failures and improve reliability and predictability.

As an example, you could have the number of messages in the message you queue to trigger the work, and if you wanted 1000 messages as the final output, you could queue 10 messages instructing your "worker" functions to add 100 messages each. I would also recommend playing with much smaller numbers per function.

I hope this helps!

Fabio Cavalcante
  • 12,328
  • 3
  • 35
  • 43
  • Thanks for your quick and elaborate reply. My function is currently triggered manually, but will eventually be triggered periodically. In my function, I have this list of thousands of messages. How do you suggest scaling this to add them to my Storage Queue? It sounds like I would need a queue to add my messages to a queue...? – Schaemelhout Nov 16 '16 at 18:40
  • I just tested my function and it was able to add 15.000 messages to my queue before timing out at 05.00 minutes... – Schaemelhout Nov 16 '16 at 18:50
  • I would still recommend following the approach above. Your initially triggered function would queue the appropriate messages (with any required state) to and the worker functions (using the work queue as the trigger) would process those messages. – Fabio Cavalcante Nov 16 '16 at 19:07
  • Maybe I should be a little more specific: I have 2 functions: 1. Manual Function doing some processing on a file and extracting about 6000 strings each time. These need to be queues for my other function to process them. 2. An automatic Function processing these messages one by one using a Queue Trigger (my 'worker' function as you call it). I assume this scales pretty easily. The only problem is getting all of my messages onto the queue from my manually triggered function. I don't persist these string values anywhere else so they need to go onto my queue in each Function trigger. – Schaemelhout Nov 16 '16 at 19:12
  • The initial function would have "chunks" of work (as opposed to one message per line/string you parse). But you get the idea. – Fabio Cavalcante Nov 16 '16 at 21:28
  • I'm sorry, but I don't really get the idea.. My initial Function doesn't have chuncks of work, it has 1 huge chunck of work to be processed in chuncks by another Function. I read and parse a file, and that returns an array of 6000 string values. I need to add all of these at once to my Queue through my output binding. This Function would be run about once a day. – Schaemelhout Nov 16 '16 at 21:46
  • I missed some of the details you shared in the comment above and made my last comment still based on the original question, but the concept still applies. You could also break this work up with blobs (instead of queues), writing blobs with smaller chunks of data to be processed (as opposed to one queue message per string). The approach will depend on the details of what you're trying to accomplish, but goal is to divide up and limit the work performed by any single function. – Fabio Cavalcante Nov 16 '16 at 23:24
  • 1
    I see what you mean now, I will split up my messages in batches of 1000 and save them to blob storage, to have another function with a blob trigger send them to my queue storage, thanks for your help! – Schaemelhout Nov 17 '16 at 08:18
0

Instead of using an output binding, you will need to send messages in batches for better throughput performance. Unfortunately, unlike Azure serviceBus that has a batching functionality baked into the API, Azure storage queue API doesn't not have a batching functionality, so you'll need to implement your own. On the bright side, there is a fairly straightforward way to achieve batching using two dimensional arrays. Below is a sample Typescript code for batching a request using promise.all:

Say you have an array containing 6000 json string objects

const messages[] = [6000 messages]

const batchSize = 10

  1. Construct message batches in form of 2D Arrays:
const createMessageBatches = (items: string[]): string[][] => {
    const messageBatches: string[][] = [];

    for (let i = 0; i < items.length; i += batchSize) {
        messageBatches.push(items.slice(i, i + batchSize));
    }

    return messageBatches;
}
  1. Send messages in Batches using QueueClient sendMessage() api
const messageBatches: string[][] = createMessageBatches(messages);

for (const batch of messageBatches) {
    await Promise.all(batch.map(message => queueClient.sendMessage(message)));
}
Skillz
  • 308
  • 4
  • 10