Azure Functions: configure blob trigger only for new events

Question

I have about 800k blobs in my azure storage. When I create azure function with a blobTrigger it starts to process all blobs that I have in the storage. How can I configure my function to be triggered only for new and updated blobs?

score 13 · Accepted Answer · answered Dec 07 '16 at 04:03

13

There is no way to do this currently. Internally we track which blobs we have processed by storing receipts in our control container azure-webjobs-hosts. Any blob not having a receipt, or an old receipt (based on blob ETag) will be processed (or reprocessed). That's why your existing blobs are being processed - they don't have receipts. BlobTrigger is currently designed to ensure that ALL blobs in a container matching the path pattern are eventually processed, and reprocessed any time they are updated.

If you feel passionately about this, you can log a feature request in our repo here with details on your scenario.

answered Dec 07 '16 at 04:03

mathewc

13,312
2
45
53

thanks for the explanation. I will file a feature request. But for now is there any workaround? Can I generate all receipts on my own? – ebashmakov Dec 07 '16 at 06:07
2

No real workaround, short of letting all the blobs be processed. You could write a noop function and let it churn through all the blobs, which would generate the receipts. Once that is done, put your actual function logic in place and going forward those old blobs would only be reprocessed if changed. – mathewc Dec 07 '16 at 06:34
Yep, I was thinking about this too, but I'm a bit afraid that it will take too long. Anyway I will try. Thanks for the help – ebashmakov Dec 07 '16 at 07:46
2

FYI, there's a new issue tracking this here: https://github.com/Azure/azure-webjobs-sdk/issues/1327 – mjlescano Jan 31 '18 at 15:49
@mathewc writing a noop function is a great idea, but what about if I ran this then I wanted to reprocess processed blobs one again, should I just clear the `azure-webjobs-hosts`? – Amr Elgarhy Mar 25 '18 at 20:03

score 2 · Answer 2 · answered Mar 20 '19 at 14:47

2

The way I get around this is to set Metadata on a processed Blob (e.g Status = Complete). When the trigger gets fired I first check for this piece of Metadata and return the function if it is already set.

The downside to this is that the update of the Metadata will trigger an additional execution of the function

answered Mar 20 '19 at 14:47

Steve Kay

21
1

Also thought of this - but doesn't that require fetching every blob as an InputStream into the Function before adjudicating on its metadata? I don't know whether the fetching of the blob is deferred until the (in python) read(). For 800k images is that feasible? – jtlz2 Jun 25 '19 at 12:21

score 1 · Answer 3 · answered Mar 18 '22 at 02:15

I know that question is so old, but my answer can help a new dev to fix it.

On BlobTrigger you can use parameter BlobProperties, that it has the properity Created and filter this date.

public async Task Run([BlobTrigger("xyz/{name}", Connection = "AzureWebJobsStorage")]Stream file, string name, ILogger logger, BlobProperties Properties)
{
    if(Properties.Created < new DateTime(2022, 03, 18, 0, 0, 0))
    {
        _logger.LogInformation("SKIP");
        return;
    }
    ...
}

Azure Functions: configure blob trigger only for new events

3 Answers3

Linked