fetching Cloud Data Fusion Runtime info

Question

I want to pass the runid of Data fusion pipeline to some function upon pipeline completion but i am not able to find any run-time variable which holds this value. Please help!

Hello! I would like to clarify if you want to retrieve the run_id during pipeline run or after completion? Thanks! — aga, Oct 30 '20 at 11:23
Hi Ines, I want to pass the run_id of the pipeline to a cloud fucntion when pipeline succeeds via HTTP Pipeline alert. I dont know if there is any variable, argument that holds the run_id of the pipeline which i can refer to pass to the function. — SUDHIR GARG, Oct 31 '20 at 12:34
Hi Sudhir, currently there is runtime information that has the runId information. Can I ask what is it you are trying to do in the cloud function that requires the pipeline runId? We have this JIRA for the feature request: https://issues.cask.co/browse/CDAP-12719 — Edwin Elia, Nov 02 '20 at 23:24
Hi Edwin, I am planning do fetch the pipeline stats like Records.in , out of Plugins using the 'POST -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/metrics/query"' API which requires RuniD of a Pipeline run as input in the body. So when Pipeline succeeds I want to pass its RunID to a Cloud Function and Cloud Function will fetch the stats with that runid. Refer: https://cloud.google.com/data-fusion/docs/reference/cdap-reference#metrics_for_a_batch_pipeline — SUDHIR GARG, Nov 03 '20 at 10:05
Hi Edwin, You mentioned that this information is there in runtime information. I tried accessing runtime['runid'] and runtime['run_id'] but this two doesn't exist. What is the key in runtime to get the run_id ? — SUDHIR GARG, Nov 05 '20 at 04:05

score 1 · Answer 1 · answered Aug 02 '21 at 15:47

As an update to the previous answer, the first thing to do is to obtain the details of the deployed pipelines in a given namespace. For this, the following endpoint should be queried: '/v3/namespaces/${NAMESPACE}/apps'. Where ${NAMESPACE} is the namespace where the pipeline is deployed.

This endpoint returns a list with the pipelines deployed on this namespace ${NAMESPACE} (not the pipeline JSON, just a high level description list). Once the pipeline list is obtained, to obtain the run metrics of a given pipeline, the following endpoint should be called: '/v3/namespaces/${NAMESPACE}/apps/${PIPELINE}/workflows/DataPipelineWorkflow/runs', where ${PIPELINE} is the name of the pipeline. This endpoint will return the details of all the runs for this pipeline. This is where the run_id can be obtained. The field containing the run_id is actually called runid in this list.

With the run_id, you can then obtain all the run logs for example by querying the endpoint '{CDAP_ENDPOINT}/v3/namespaces/{NAMESPACE}/apps/{PIPELINE}/workflows/DataPipelineWorkflow/runs/{run["runid"]}/logs?start={run["start"]}&stop={run["start"]}'. The previous snippet is a python snippet where run is a dictionary containing the run details of a particular run.

As explained in the CDAP microservice guide, to call these endpoints, the CDAP endpoint must be obtained by running the command: gcloud beta data-fusion instances describe --project=${PROJECT} --location=${REGION} --format="value(apiEndpoint)" ${INSTANCE_ID}. The authentication token will also be needed and this can be found through running: gcloud auth print-access-token.

aga · Answer 2 · 2020-11-13T15:12:01.490

The correct answer has been provided by @Edwin Elia in the comment section:

Retrieving the run-id of a Data Fusion pipeline within its run or the predecessor pipeline's is not possible currently. Here is an enhancement that you can track that would make it possible.

When talking about retrieving the run_id value after pipeline completion you should be able to use the REST API from the CDAP documentation to get information on the run including the run-id.

fetching Cloud Data Fusion Runtime info

2 Answers2