10

I came across this CTE solution for concatenating row elements and I thought it's brilliant and I realized how powerful CTEs can be.

However, in order to use such a tool effectively I need to know how it works internally to build that mental image which is essential for beginners, like me, to use it in different scenarios.

So I tried to slow motion the process of the above snippet and here is the code

USE [NORTHWIND]
GO
/****** Object:  Table [dbo].[Products2]  Script Date: 10/18/2011 08:55:07 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF OBJECT_ID('Products2','U') IS NOT NULL  DROP TABLE [Products2]
CREATE TABLE [dbo].[Products2](
  [ProductID] [int] IDENTITY(1,1) NOT NULL,
  [ProductName] [nvarchar](40) NOT NULL,
  [SupplierID] [int] NULL,
  [CategoryID] [int] NULL,
  [QuantityPerUnit] [nvarchar](20) NULL,
  [UnitPrice] [money] NULL,
  [UnitsInStock] [smallint] NULL,
  [UnitsOnOrder] [smallint] NULL,
  [ReorderLevel] [smallint] NULL,
  [Discontinued] [bit] NOT NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Products2] ON
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (1, N'vcbcbvcbvc', 1, 4, N'10 boxes x 20 bags', 18.0000, 39, 0, 10, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (2, N'Changassad', 1, 1, N'24 - 12 oz bottles', 19.0000, 17, 40, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (3, N'Aniseed Syrup', 1, 2, N'12 - 550 ml bottles', 10.0000, 13, 70, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (4, N'Chef Anton''s Cajun Seasoning', 2, 2, N'48 - 6 oz jars', 22.0000, 53, 0, 0, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (5, N'Chef Anton''s Gumbo Mix', 10, 2, N'36 boxes', 21.3500, 0, 0, 0, 1)
SET IDENTITY_INSERT [dbo].[Products2] OFF
GO
IF OBJECT_ID('DELAY_EXEC','FN') IS NOT NULL  DROP FUNCTION DELAY_EXEC
GO
CREATE FUNCTION DELAY_EXEC() RETURNS DATETIME
AS
BEGIN
  DECLARE @I INT=0
  WHILE @I<99999
  BEGIN
  SELECT @I+=1
  END
  RETURN GETDATE()
END
GO

WITH CTE (EXEC_TIME, CategoryID, product_list, product_name, length)
     AS (SELECT dbo.DELAY_EXEC(),
                CategoryID,
                CAST('' AS VARCHAR(8000)),
                CAST('' AS VARCHAR(8000)),
                0
         FROM   Northwind..Products2
         GROUP  BY CategoryID
         UNION ALL
         SELECT dbo.DELAY_EXEC(),
                p.CategoryID,
                CAST(product_list + CASE
                                      WHEN length = 0 THEN ''
                                      ELSE ', '
                                    END + ProductName AS VARCHAR(8000)),
                CAST(ProductName AS VARCHAR(8000)),
                length + 1
         FROM   CTE c
                INNER JOIN Northwind..Products2 p
                  ON c.CategoryID = p.CategoryID
         WHERE  p.ProductName > c.product_name)
SELECT *
FROM   CTE
ORDER  BY EXEC_TIME  

--SELECT CategoryId, product_list
--  FROM ( SELECT CategoryId, product_list,
--  RANK() OVER ( PARTITION BY CategoryId ORDER BY length DESC )
--   FROM CTE ) D ( CategoryId, product_list, rank )
--   WHERE rank = 1 ;

The commented block is the desired output for the concatenation problem but it's not the question here.

I've added a column EXEC_TIME to know which row got added first. The output doesn’t look right to me for two reasons

  1. I thought there would be a redundant data because of the condition p.ProductName > c.product_name in another word the first part of the CTE the empty rows are always less then values in the Product2 table so each time it runs it should bring a new set of already added rows once again. Does this make any sense?

  2. The hierarchy of data is really weird the last item should be the longest and look what is the last item? An item with length=1?

Any expert to the rescue? Thanks in advance.

Sample Results

EXEC_TIME               CategoryID  product_list                                                        product_name                      length
----------------------- ----------- ------------------------------------------------------------------- --------------------------------- -----------
2011-10-18 12:46:14.930 1                                                                                                                 0
2011-10-18 12:46:14.990 2                                                                                                                 0
2011-10-18 12:46:15.050 4                                                                                                                 0
2011-10-18 12:46:15.107 4           vcbcbvcbvc                                                          vcbcbvcbvc                        1
2011-10-18 12:46:15.167 2           Aniseed Syrup                                                       Aniseed Syrup                     1
2011-10-18 12:46:15.223 2           Chef Anton's Cajun Seasoning                                        Chef Anton's Cajun Seasoning      1
2011-10-18 12:46:15.280 2           Chef Anton's Gumbo Mix                                              Chef Anton's Gumbo Mix            1
2011-10-18 12:46:15.340 2           Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mix                Chef Anton's Gumbo Mix            2
2011-10-18 12:46:15.400 2           Aniseed Syrup, Chef Anton's Cajun Seasoning                         Chef Anton's Cajun Seasoning      2
2011-10-18 12:46:15.463 2           Aniseed Syrup, Chef Anton's Gumbo Mix                               Chef Anton's Gumbo Mix            2
2011-10-18 12:46:15.520 2           Aniseed Syrup, Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mi  Chef Anton's Gumbo Mix            3
2011-10-18 12:46:15.580 1           Changassad                                                          Changassad                        1
Community
  • 1
  • 1
Elemeen Elsayed
  • 163
  • 4
  • 14

2 Answers2

5

This is an interesting question that helped me better understand recursive CTEs too.

If you look at the execution plan you will see that a spool is used and that it has the WITH STACK property set. Which means that rows are read in a stack-like manner (Last In First Out)

So first the anchor part runs

EXEC_TIME               CategoryID  product_list  
----------------------- ----------- --------------
2011-10-18 12:46:14.930 1                         
2011-10-18 12:46:14.990 2                         
2011-10-18 12:46:15.050 4                

Then 4 is processed as that is the last row added. The JOIN returns 1 row that is added to the spool then this newly added row is processed. In this case the Join returns nothing so there is nothing additional added to the spool and it moves on to processing the CategoryID = 2 row.

This returns 3 rows which are added to the spool

Aniseed Syrup
Chef Anton's Cajun Seasoning
Chef Anton's Gumbo Mix   

then each of these rows are processed in turn in a similar LIFO fashion with any child rows added being dealt with first before processing can move on to the sibling rows. Hopefully you can see how this recursive logic explains your observed results but just in case you can't a C# simulation

using System;
using System.Collections.Generic;
using System.Linq;

namespace Foo
{
    internal class Bar
    {
        private static void Main(string[] args)
        {
            var spool = new Stack<Tuple<int, string, string>>();

            //Add anchor elements
            AddRowToSpool(spool, new Tuple<int, string, string>(1, "", ""));
            AddRowToSpool(spool, new Tuple<int, string, string>(2, "", ""));
            AddRowToSpool(spool, new Tuple<int, string, string>(4, "", ""));

            while (spool.Count > 0)
            {
                Tuple<int, string, string> lastRowAdded = spool.Pop();
                AddChildRows(lastRowAdded, spool);
            }

            Console.ReadLine();
        }

    private static void AddRowToSpool(Stack<Tuple<int, string, string>> spool,
                                      Tuple<int, string, string> row)
        {
            Console.WriteLine("CategoryId={0}, product_list = {1}",
                              row.Item1,
                              row.Item3);
            spool.Push(row);
        }

    private static void AddChildRows(Tuple<int, string, string> lastRowAdded,
                                     Stack<Tuple<int, string, string>> spool)
        {
            int categoryId = lastRowAdded.Item1;
            string productName = lastRowAdded.Item2;
            string productList = lastRowAdded.Item3;

            string[] products;

            switch (categoryId)
            {
                case 1:
                    products = new[] {"Changassad"};
                    break;
                case 2:
                    products = new[]
                                   {
                                       "Aniseed Syrup",
                                       "Chef Anton's Cajun Seasoning",
                                       "Chef Anton's Gumbo Mix "
                                   };
                    break;
                case 4:
                    products = new[] {"vcbcbvcbvc"};
                    break;
                default:
                    products = new string[] {};
                    break;
            }


            foreach (string product in products.Where(
                product => string.Compare(productName, product) < 0))
            {
                string product_list = string.Format("{0}{1}{2}",
                                                 productList,
                                                 productList == "" ? "" : ",",
                                                 product);

                AddRowToSpool(spool,
                              new Tuple<int, string, string>
                                  (categoryId, product, product_list));
            }
        }
    }
}

Returns

CategoryId=1, product_list =
CategoryId=2, product_list =
CategoryId=4, product_list =
CategoryId=4, product_list = vcbcbvcbvc
CategoryId=2, product_list = Aniseed Syrup
CategoryId=2, product_list = Chef Anton's Cajun Seasoning
CategoryId=2, product_list = Chef Anton's Gumbo Mix
CategoryId=2, product_list = Chef Anton's Cajun Seasoning,Chef Anton's Gumbo Mix
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Cajun Seasoning
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Gumbo Mix
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Cajun Seasoning,Chef Anton's Gumbo Mix
CategoryId=1, product_list = Changassad
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Interesting. But I'd add the caveat that you shouldn't *expect* that it will always use the `WITH STACK` option - it's very much an implementation detail in the current version, so the processing order should not be assumed to be as you've shown above. – Damien_The_Unbeliever Oct 18 '11 at 14:32
  • @Damien_The_Unbeliever - Yes agreed that this order of operations shouldn't be relied upon in any way. – Martin Smith Oct 18 '11 at 14:38
  • Thanks Indeed for an elaborate and simple explanation – Elemeen Elsayed Oct 19 '11 at 16:12
4

The page Recursive Queries Using Common Table Expressions describes the logic of CTEs:

The semantics of the recursive execution is as follows:

  1. Split the CTE expression into anchor and recursive members.

  2. Run the anchor member(s) creating the first invocation or base result set (T0).

  3. Run the recursive member(s) with Ti as an input and Ti+1 as an output.

  4. Repeat step 3 until an empty set is returned.

  5. Return the result set. This is a UNION ALL of T0 to Tn.

However, that's only the logical flow. As always, with SQL, the server is free to reorder operations as it sees fit, if the result will be "the same", and the reordering is perceived to provide the results more efficiently.

The presence of your function with side effects (causing a delay, then returning GETDATE()) isn't something that would normally be considered when deciding whether to reorder operations.

One obvious way in which the query may be reordered is that it may decide to start working on result set Ti+1 before it has fully created result set Ti - it may be more efficient to do this than to fully construct Ti first, since the new rows are definitely already in memory and have been accessed recently.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • Thanks for your answer. by the way "The presence of my function" was not for Sql Server to reorder operations. it was simply a way to find out which/when a row was executed. – Elemeen Elsayed Oct 21 '11 at 15:47