5

I have some data factory pipelines which may sometimes run beyond 2 hours when copying data from blob into SQL. The time period is variable, but I'd like to be notified/alerted when any pipeline runs beyond 2 hours.

What are possible ways of doing this?

What I have tried so far:

  • Explored the adf metrics on which I can put an alert rule. But there seems to be none which talks about active run's duration.
  • I was hoping to get Pipeline's duration value as we see it on the monitor tab in adf.azure.com and use this to put some sort of alert.
  • I was also thinking if I can get pipeline start time then maybe i can calculate from current time the total run time and put some alert on top of that.

enter image description here

DhruvJoshi
  • 17,041
  • 6
  • 41
  • 60

3 Answers3

3

We do something like this to track running pipelines and manage execution concurrency. I find Logic Apps and Azure Functions great tools for creating these kinds of solutions. Here is a rough outline of how we handle this:

  1. A set of Azure Functions (AF) that leverage the Microsoft.Azure.Management.DataFactory SDK. The relevant code is at the bottom of this post.
  2. A log of pipeline executions in a SQL Server table. The table includes the PipelineId and Status, and some other information. You would need to INSERT to this table whenever you create a pipeline. We use a separate Logic App that calls an AF to execute the pipeline using the "RunPipelineAsync" method in the code below, capture the new PipelineId (RunId), and send it to a Stored Procedure to log the PipelineId.
  3. A Logic App running on a recurrence trigger(every 3 minutes) that a) calls a Stored Procedure that polls the table (#2 above) and returns all pipelines with Status = "InProgress"; b) foreach over the returned list and call an AF (#1 above) that checks the current status of the pipeline using the "GetPipelineInfoAsync" method in the code below; and c) calls another Stored Procedure to update the status in the table.

You could do something similar to this and use the "DurationInMS" to generate appropriate actions based on status = "InProgress" and total running time > {desired alert threshold}.

Here is the DataFactoryHelper class I use:

using Microsoft.IdentityModel.Clients.ActiveDirectory;
using Microsoft.Rest;
using Microsoft.Azure.Management.ResourceManager;
using Microsoft.Azure.Management.DataFactory;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace AzureUtilities.DataFactory
{
    public class DataFactoryHelper
    {
        private ClientCredential Credentials { get; set; }
        private string KeyVaultUrl { get; set; }
        private string TenantId { get; set; }
        private string SubscriptionId { get; set; }

        private DataFactoryManagementClient _client = null;
        private DataFactoryManagementClient Client
        {
            get {
                if (_client == null)
                {
                    var context = new AuthenticationContext("https://login.windows.net/" + TenantId);
                    AuthenticationResult result = context.AcquireTokenAsync("https://management.azure.com/", Credentials).Result;
                    ServiceClientCredentials cred = new TokenCredentials(result.AccessToken);
                    _client = new DataFactoryManagementClient(cred) { SubscriptionId = SubscriptionId };
                }

                return _client;
            }
        }

        public DataFactoryHelper(string servicePrincipalId, string servicePrincipalKey, string tenantId, string subscriptionId)
        {
            Credentials = new ClientCredential(servicePrincipalId, servicePrincipalKey);
            TenantId = tenantId;
            SubscriptionId = subscriptionId;
        }

        public async Task<string> RunPipelineAsync(string resourceGroupName,
                                                   string dataFactoryName,
                                                   string pipelineName,
                                                   Dictionary<string, object> parameters = null,
                                                   Dictionary<string, List<string>> customHeaders = null)
        {
            var runResponse = await Client.Pipelines.CreateRunWithHttpMessagesAsync(resourceGroupName, dataFactoryName, pipelineName, parameters: parameters , customHeaders: customHeaders);
            return runResponse.Body.RunId;
        }

        public async Task<object> GetPipelineInfoAsync(string resourceGroup, string dataFactory, string runId)
        {
            var info = await Client.PipelineRuns.GetAsync(resourceGroup, dataFactory, runId);
            return new
            {
                RunId = info.RunId,
                PipelineName = info.PipelineName,
                InvokedBy = info.InvokedBy.Name,
                LastUpdated = info.LastUpdated,
                RunStart = info.RunStart,
                RunEnd = info.RunEnd,
                DurationInMs = info.DurationInMs,
                Status = info.Status,
                Message = info.Message
            };
        }
    }
}
Joel Cochran
  • 7,139
  • 2
  • 30
  • 43
  • I was wondering how this solution would evolve with the (recently?) updated metric "elapsed time pipeline". Can this be used within Azure Data Factory instead together with an alert rule or is strictly relevant for failed pipelines? – bramb Dec 02 '20 at 15:38
  • 1
    @bramb the Elapsed Time Pipeline Runs Metrics capabilities is limited to triggering warnings based on the aggregation (count) of errors occurrences given a time selection window. It doesn't actually have anything to do with the duration of pipeline runs. – adrien Dec 07 '21 at 10:15
1

One way of doing it work-around wise would be to log a timestamp in your SQL database as a first step in your pipeline and then keep track of the load by monitoring the sessions in your database engine.

Cedersved
  • 1,015
  • 1
  • 7
  • 21
1

Since September 2022 it is possible to define an elapsed time after which ADF will record a metric in Azure Monitor. Alerting can be triggered from there. This is configured on the Settings tab of a pipeline. Details are found at this link.

Michael Green
  • 1,397
  • 1
  • 17
  • 25