You have a few options. I will describe three here:
- if you have a nearby Azure SQL DB and the volumes aren't too large, present the table in that Azure SQL DB using
CREATE EXTERNAL TABLE
or simply use Azure Data Factory (ADF) to copy the data over, perform your recursive CTE then use ADF to port it back over. Alternately use some kind of pre-processing before this data gets into your SQL pool.
- Recursive CTE is just a type of loop at the end of the day so
WHILE
is supported on Synapse. Now obviously this type of loop does not translate well to Synapse as it’s chatty but may be an option for small volumes with low depths of hierarchy. It's up to you to examine the trade-off in using the MPP architecture ineffectively in this way versus coding up an alternative.
I've coded up an example of option 2 which took over 20 seconds to run for just a few rows. Normally I would regard that as unacceptable but as mentioned it's up to you to weigh up the alternatives:
IF OBJECT_ID('dbo.someHierarchy') IS NOT NULL
DROP TABLE dbo.someHierarchy;
CREATE TABLE dbo.someHierarchy (
code INT NOT NULL,
[name] VARCHAR(50) NOT NULL,
under INT NOT NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
HEAP
);
INSERT INTO dbo.someHierarchy ( code, [name], under )
SELECT 1, 'National Sales Manager', 1
UNION ALL
SELECT 2, 'Regional Sales Manager', 1
UNION ALL
SELECT 3, 'Area Sales Manager', 2
UNION ALL
SELECT 4, 'Sales Manager', 3
INSERT INTO dbo.someHierarchy ( code, [name], under )
SELECT 5, 'Lead Bob', 5
UNION ALL
SELECT 6, 'Main Bob', 5
UNION ALL
SELECT 7, 'Junior Bob 1', 6
UNION ALL
SELECT 8, 'Junior Bob 2', 6
INSERT INTO dbo.someHierarchy ( code, [name], under )
SELECT 9, 'Jim - CEO', 9
UNION ALL
SELECT 10, 'Tim - CFO', 9
UNION ALL
SELECT 11, 'Rob - CIO', 9
UNION ALL
SELECT 12, 'Bob - VP', 10
UNION ALL
SELECT 13, 'Shon - Director', 12
UNION ALL
SELECT 14, 'Shane - VP', 11
UNION ALL
SELECT 15, 'Sheryl - VP', 11
UNION ALL
SELECT 16, 'Dan - Director', 15
UNION ALL
SELECT 17, 'Kim - Director', 15
UNION ALL
SELECT 18, 'Carlo - PM', 16
UNION ALL
SELECT 19, 'Monty - Sr Dev', 18
UNION ALL
SELECT 20, 'Chris - Sr Dev', 18
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp;
CREATE TABLE #tmp (
xlevel INT NOT NULL,
code INT NOT NULL,
[name] VARCHAR(50) NOT NULL,
under INT NOT NULL,
ultimateParent INT NOT NULL
);
-- Insert first level; similar to anchor section of CTE
INSERT INTO #tmp ( xlevel, code, [name], under, ultimateParent )
SELECT 1 AS xlevel, code, [name], under, under AS ultimateParent
FROM dbo.someHierarchy
WHERE under = code;
-- Loop section
DECLARE @i INT = 1
WHILE EXISTS (
SELECT * FROM dbo.someHierarchy h
WHERE NOT EXISTS ( SELECT * FROM #tmp t WHERE h.code = t.code )
)
BEGIN
-- Insert subsequent levels; similar to recursive section of CTE
INSERT INTO #tmp ( xlevel, code, [name], under, ultimateParent )
SELECT t.xlevel + 1, h.code, h.[name], h.under, t.ultimateParent
FROM #tmp t
INNER JOIN dbo.someHierarchy h ON t.code = h.under
WHERE h.under != h.code
AND t.xlevel = @i;
-- Increment counter
SET @i += 1
-- Loop guard
IF @i > 99
BEGIN
RAISERROR( 'Too many loops!', 16, 1 )
BREAK
END
END
SELECT 'loop' s, *
FROM #tmp
ORDER BY code, xlevel;
Results:

The condition is the WHILE EXISTS
loop is a particularly expensive way to do it so maybe there is a simpler way with your data.
A third alternative is to use an Azure Synapse Notebook and library like GraphFrames to walk the hierarchy. There way be simpler ways to do it but I found the Connected Components method was able to determine the ultimate manager. One advantage of using GraphFrames is that it would allow more complex graph queries eg using motifs if required.
This notebook is using the Spark (Scala) version:
Upload the right version of the graphFrames library to Spark:
%%configure -f
{
"conf": {
"spark.jars": "abfss://{yourContainer}@{yourDataLake}.dfs.core.windows.net/synapse/workspaces/{yourWorkspace}/sparkpools/{yourSparkpool}/libraries/graphframes-0.8.1-spark2.4-s_2.11.jar",
}
}
Configure the elements with curly brackets for your environment.
Import the relevant libraries:
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.graphframes._
Get the data from the dedicated SQL pool and assign it to a dataframe:
// Get a table from Synapse dedicated SQL pool, select / rename certain columns from it to vertices and edge dataframes
val df = spark.read.synapsesql("yourDB.dbo.someHierarchy")
val v = df.selectExpr("code AS id", "name AS empName", "under")
v.show
// Reformat the code/under relationship from the original table
// NB Exclude because in graph terms these don't have an edge
val e = df.selectExpr("code AS src", "under AS dst", "'under' AS relationship").where("code != under")
e.show
Create the graphframe from the vertices and edges dataframes:
// Create the graph frame
val g = GraphFrame(v, e)
print(g)
Set a checkpoint for connectedComponents:
// The connected components adds a component id to each 'group'
// Set a checkpoint to start
sc.setCheckpointDir("/tmp/graphframes-azure-synapse-notebook")
Run connected components algorithm against the data:
// Run connected components algorithm against the data
val cc = g.connectedComponents.run() // doesn't work on Spark 1.4
display(cc)
Join the original vertices dataframe and the result of the connected components algorithm and write it back to Azure Synapse dedicated SQL pool:
val writeDf = spark.sqlContext.sql ("select v.id, v.empName, v.under, cc.component AS ultimateManager from v inner join cc on v.id = cc.id")
//display(writeDf)
writeDf.write.synapsesql("someDb.dbo.someHierarchy2", Constants.INTERNAL)
Results:

I have a feeling there would be a simpler way to accomplish this with notebooks but look forward to seeing some alternatives. Upvote the feedback item for recursive CTEs on Synapse here:
https://feedback.azure.com/forums/307516-azure-synapse-analytics/suggestions/14876727-support-for-recursive-cte