4

I have data in a self-join hierarchical table where Continents have many Countries have many Regions have many States have many Cities.

Self-joining table structure:

|-------------------------------------------------------------|
| ID  | Name          |  Type       |  ParentID  | IsTopLevel |
|-------------------------------------------------------------|
| 1   | North America |  Continent  |  NULL      | 1          |
| 12  | United States |  Country    |  1         | 0          |
| 113 | Midwest       |  Region     |  12        | 0          |
| 155 | Kansas        |  State      |  113       | 0          |
| 225 | Topeka        |  City       |  155       | 0          |
| 2   | South America |  Continent  |  NULL      | 1          |
| 22  | Argentina     |  Country    |  2         | 0          |
| 223 | Southern      |  Region     |  22        | 0          |
| 255 | La Pampa      |  State      |  223       | 0          |
| 777 | Santa Rosa    |  City       |  255       | 0          |
|-------------------------------------------------------------|

I have been able to successfully use a recursive CTE to get the tree structure and depth of each node. Where I am failing is using a pivot to create a nice list of all bottom locations and their corresponding parents at each level.

The expected results:

|------------------------------------------------------------------------------------|
| Continent     | Country       | Region   | State    | City       | Bottom_Level_ID |
|------------------------------------------------------------------------------------|
| North America | United States | Midwest  | Kansas   | Topeka     | 234             |
| South America | Argentina     | Southern | La Pampa | Santa Rosa | 777             |
|------------------------------------------------------------------------------------|

There are a few key points I should clarify.

  1. Every single entry has a bottom level and a top level. There are no cases where all five Types are not present for a given location.

  2. If I filled out this data, I'd have 50 entries for North America at the State level, so you can imagine how immense this table is at the City level for every continent on the planet. Billions of rows.

  3. The reason this is a necessity is because I need to be able to join onto a historical table of all addresses a person has lived at, and journey up the tree. I figure if I have the LocationID from that table, I can just LEFT JOIN onto a View of this query and nab the appropriate columns.

  4. This is an old database, 2005, and I don't have sysadmin or control of the schema.

My CTE Code

--CTE
;WITH Tree
AS (
  SELECT ID, Name, ParentID, Type, 1 as Depth
  FROM LocationTable
  WHERE IsTopLevel = 1
  UNION ALL
  SELECT L.ID, L.Name, L.ParentID, L.Type, T.Depth+1
  FROM Tree T
    JOIN LocationTable L
      ON L.ParentGUID = T.GUID
)

Good solid data, in a mostly useful format. BUT then I got to thinking about it and isn't the table structure already in this format, so why would I bother doing a depth tree search if I wasn't going to join the entries together at the same time?

Anyway, here was the rest.

The Pivot Attempt

;WITH Tree
AS (
  SELECT ID, Name, ParentID, Type
  FROM LocationTable
  WHERE IsTopLevel = 1
  UNION ALL
  SELECT L.ID, L.Name, L.ParentID, L.Type
  FROM Tree T
    JOIN LocationTable L
      ON L.ParentGUID = T.GUID
)
select *
from Tree
pivot (
    max(Name)
    for Type in ([Continent],[Country],[Region],[State],[City])
) pvt

And now I have everything by Type in a column, with nulls for everything else. As I have struggled with before, I need to filter/join the CTE data before I attempt my pivot, but I have no idea where to start with that piece. Everything I have tried is soooooooooo sloooooooow.

Everytime I think I understand CTEs and Pivot, something new makes me extremely humbled. Please help me. ; ;

BlueCucumber
  • 63
  • 1
  • 7
  • Would it be possible to write the whole lot in a new datatable with 5 columns *side-by-side*? If I understand you correctly, there are **always** five levels from top to bottom... The flexibility of a tree and a recursive approach seems not to be needed at all... – Shnugo Sep 01 '16 at 22:22
  • I cannot alter the schema. – BlueCucumber Sep 02 '16 at 18:38

2 Answers2

4

If your structure is as clean as you describe it (no gaps, 5 levels always) you might go the easy way:

This data really demands for a classical 1:n-table-tree, where your Countries, States etc. live in their own tables and link to their parent record

Make sure there's an index on ParentID and ID!

DECLARE @tbl TABLE(ID INT,Name VARCHAR(100),Type VARCHAR(100),ParentID INT,IsTopLevel BIT);
INSERT INTO @tbl VALUES
 (1,'North America','Continent',NULL,1)
,(12,'United States','Country',1,0)
,(113,'Midwest','Region',12,0)
,(155,'Kansas','State',113,0)
,(225,'Topeka','City',155,0)
,(2,'South America','Continent',NULL,1)
,(22,'Argentina','Country',2,0)
,(223,'Southern','Region',22,0)
,(255,'La Pampa','State',223,0)
,(777,'Santa Rosa','City',255,0);

SELECT Level1.Name AS Continent
      ,Level2.Name AS Country
      ,Level3.Name AS Region
      ,Level4.Name AS State
      ,Level5.Name AS City
      ,Level5.ID AS Bottom_Level_ID
FROM @tbl AS Level1 
    INNER JOIN @tbl AS Level2 ON Level1.ID=Level2.ParentID
        INNER JOIN @tbl AS Level3 ON Level2.ID=Level3.ParentID
            INNER JOIN @tbl AS Level4 ON Level3.ID=Level4.ParentID
                INNER JOIN @tbl AS Level5 ON Level4.ID=Level5.ParentID
WHERE Level1.ParentID IS NULL

The result

Continent       Country         Region      State       City      Bottom_Level_ID
North America   United States   Midwest     Kansas      Topeka      225
South America   Argentina       Southern    La Pampa    Santa Rosa  777
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • I ran this on a small sample set and it never occurred to me that I could do it this way. Do you know if this is fast for billions of relationship links? I started running it on my db and its still going. – BlueCucumber Sep 02 '16 at 18:46
  • Billions of rows is quite a lot... Are there indexes on ID and ParentID? As far as I understood you have to bind location to addresses. So you do not need the full result, just a *back-chain* up from a `Bottom_Level_ID`. Correct? This should work instantly... Anyway this **must** be faster than any recursive approach, which is a hidden RBAR... One good choice might be - if this works for you - to use my query (add alle the LevelIDs) and use `SELECT ... INTO NewTable FROM ...`. This may run for hours, but only once... Place indexes and you will have a lighting fast data source... – Shnugo Sep 02 '16 at 20:01
0

Another solution with CTE could be :

;WITH Tree
AS (
  SELECT cast(NULL as varchar(100)) as C1, cast(NULL as varchar(100)) as C2, cast(NULL as varchar(100)) as C3, cast(NULL as varchar(100)) as  C4, Name as C5, ID as B_Level
  FROM LocationTable
  WHERE IsTopLevel = 1
  UNION ALL
  SELECT T.C2, T.C3, T.C4, T.C5, L.Name, L.ID
  FROM Tree T
    JOIN LocationTable L
      ON L.ParentID = T.B_Level
)
select *
from Tree
where C1 is not null
Polux2
  • 552
  • 3
  • 12
  • Msg 240, Level 16, State 1, Line 1 Types don't match between the anchor and the recursive part in column "C4" of recursive query "Tree". – BlueCucumber Sep 02 '16 at 18:41
  • You're right. I've corrected my mistake by replacing NULL by cast(NULL as varchar(100)). Now it's works. – Polux2 Sep 02 '16 at 19:09