2

I have a Log Analytics Diagnostic Setting on a very active ADLS Gen2 Storage Account. The goal is to reconcile blobs uploaded to Containers within the Storage Account with blobs processed by an Azure Function.

Problem: Azure Log Analytics does not return result sets > 30k records

enter image description here

Ideally, the reconciliation happens all at once; compare incoming blobs with processed blobs at the end of the day.

But this doesn't seem possible if there are > 30k records. Seems like I'll have to schedule some kind of hourly reconcile (not ideal).

What are some strategies for handling this in a simple way?

ericOnline
  • 1,586
  • 1
  • 19
  • 54

2 Answers2

2

I actually had a similar issue where I was calling App Insight API and the query that I was passing it looked like this and it was only returning 30K results.

customEvents
| where operation_Name == 'GET /api/v1/endpointName'
| order by name desc
| project name

But the problem was it should return over 100K results. So I was able to modify the query like this

customEvents
| where operation_Name == 'GET /api/v1/endpointName'
| order by name desc
| extend rn=row_number()
| project name, rn
| where rn > 0

And then in the last line, I passed 0, 30000, 60000, 90000 and so on. Finally when I pass 120000 no results are returned and so that's how I know when to stop.

Varun Sharma
  • 2,591
  • 8
  • 45
  • 63
1

You could use a cursor in the filter. As example you do a query to obtain a set of results sorting on a specific column that's preferably unique. Then just take the last event and use that in your subsequent request as filter to get the next results. I'm not sure if monitoring garantuees that 100% of the incoming requests are available.

You could also look at using proper event driven resources. As example you could setup an Event Grid Subscription on the resource to push a blob creation event to either a Queue or Service Bus. Then process your blobs using that information.

NotFound
  • 5,005
  • 2
  • 13
  • 33
  • Thank you for the ideas. I'll try to find a unique value an incrementor of some kind (timestamp, ID, etc.) I'm a bit hesitant to setup a EventGrid to establish logging that monitors other EventGrid services. – ericOnline May 20 '21 at 20:39
  • I was actually able to reduce the number of results by excluding certain `OperationNames`. I really don't need to log all the `GetBlob` instances. Just the `PutBlob` and `AppendBlob` actions. This helped keep the numbers below 30k. – ericOnline May 20 '21 at 20:41