Synapse Notebook reference - Call Synapse pipeline from Notebook

Question

I'm trying to run a synapse pipeline from a synapse notebook, is there any way to do it?

My synapse pipeline has parameters, - if it's possible to run it from a notebook then how to pass the params?

Not sure if this would be blocked, although the pipelines do have a REST API but I mean, why not use a Synapse Pipeline to do the main orchestration and let notebooks do fun Spark things? — wBob, Jul 22 '22 at 13:08
Yes, this (...use a Synapse Pipeline to do the main orchestration.....) works fine, I just wanted to treat a pipeline , with params , as a "tool" called from the Spark. — Robert G, Jul 25 '22 at 19:36
Yeah ok, but you probably need to think about cost too. There’s always a 2-3 minute start up time for the Spark pools, plus you’ll have them running the whole time your pipeline is running? Sounds expensive to me. — wBob, Jul 25 '22 at 20:18

score 0 · Accepted Answer · answered Aug 02 '22 at 03:42

As suggested by Wbob, I would also recommend not doing this (calling synapse pipelines from synapse notebook) because of the spark pools.

But if you want to give a try, you can go through this approach other than REST API.

In this, I have added a Storage event trigger to synapse pipeline and it triggers by writing to the storage using synapse notebook code.

For this you required a Storage account and an empty container.

First create a linked service for ADLS Gen2 in the synapse and a Storage event trigger to the synapse pipeline.

In the Synapse notebook, write a json file to ADLS which triggers the pipeline. You can pass the parameters from synapse notebook using the same json.

Code in the Synapse notebook with sample parameters:

from pyspark.sql.types import *
myjson=[{"Name":"Rakesh","Age":22,"Marks":90}]
schema2=StructType([StructField('Name',StringType(),True),StructField('Age',IntegerType(),True),StructField('Marks',IntegerType(),True)])
df = spark.createDataFrame(data=myjson,schema=schema2)
df.show(truncate=False) 

df2=df.toPandas()
df2.reset_index().to_json('abfss://input/myjson.json', storage_options = {'linked_service' : 'AzureDataLakeStorage1'},orient='records')

enter image description here

enter image description here

Here input is my container and AzureDataLakeStorage1 is my linked service. orient='records' gives the indexes with the parameters.

This will create the myjson.json file in the ADLS container like below.

enter image description here

The above file triggers the Synapse pipeline. To use the parameters that we pass, use lookup activity in the pipeline.

Lookup activity:

In the lookup, don't give any dataset value in the dataset. give the wildcard path file as *.json.

enter image description here

Look up will give the result like this:

enter image description here

You can use the expression @activity('Lookup1').output.value[0].Age and @activity('Lookup1').output.value[0].Name like these to access the parameters with their own data type in the synapse pipeline.

For Example in set variable activity:

enter image description here

Output:
As variables only supports Strings, Booleans and Arrays I have converted it to string for showing the output. You can use those parameters wherever you want.

", I would also recommend not doing this" -> that's true. But Rakesh great thanks for showing the way, if i'll be desperated i'll try it :) — Robert G, Oct 24 '22 at 14:06

Synapse Notebook reference - Call Synapse pipeline from Notebook

1 Answers1