How to diagnose slow/inconsistent SQL Server query?

Question

Running Windows Server 2012, Hyper-V, SQL Server 2012 Active/Passive failover cluster w/two 8-processor, 60GB nodes, single instance, 300 databases. This query produces inconsistent results, running anywhere between 10 and 30 seconds.

DECLARE @OrgID           BigInt = 780246
DECLARE @ActiveOnly      Bit = 0
DECLARE @RestrictToOrgID Bit = 0;

WITH og (OrgID, GroupID) AS
  (
  SELECT ID, ID FROM Common.com.Organizations WHERE ISNULL(ParentID, 0) <> ID 
  UNION ALL
  SELECT o.ID, og.GroupID FROM Common.com.Organizations o JOIN og ON og.OrgID = o.ParentID
  )

   SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
          v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide, 
          a.*
     FROM og
     JOIN books.Organizations           bo ON  bo.CommonID = og.OrgID
     JOIN books.Organizations           po ON  po.CommonID = og.GroupID
     JOIN books.Entities                e  ON   e.OrgID    = po.ID
     JOIN Vendors                       v  ON   v.ID       =  e.ID 
                                           AND (e.OrgID    = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses                     a  ON   a.ID       =  e.AddressID
    WHERE bo.ID = @OrgID
      AND (@ActiveOnly      = 0 OR e.Active = 1)
      AND (@RestrictToOrgID = 0 OR e.OrgID  = @OrgID)
 ORDER BY e.EntityName

Replacing the LEFT JOIN Addresses with JOIN Addresses

     JOIN Addresses                     a  ON   a.ID       =  e.AddressID
    WHERE bo.ID = @OrgID
      AND (@ActiveOnly      = 0 OR e.Active = 1)
      AND (@RestrictToOrgID = 0 OR e.OrgID  = @OrgID)
 ORDER BY e.EntityName

or reducing the length of the columns selected from Addresses to less than 100 bytes

   SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
          v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide, 
          a.Fax

reduces the execution time to about .5 seconds.

In addition, using SELECT DISTINCT and joining books.Entities to Vendors

   SELECT DISTINCT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
          v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide, 
          a.*
     FROM og
     JOIN books.Organizations           bo ON  bo.CommonID = og.OrgID
     JOIN books.Organizations           po ON  po.CommonID = og.GroupID
     JOIN Vendors                       v  
     JOIN books.Entities                e  ON   v.ID       =  e.ID 
                                           ON   e.OrgID    = bo.ID OR (e.OrgID = po.ID AND v.DistrictWide = 1)

Reduces the time to about .75 seconds.

Summary

These conditions indicate there is some kind of resource limitation in the SQL Server instance that is causing these erratic results and I don't know how to go about diagnosing it. If I copy the offending database to my laptop running SQL Server 2012, the problem does not present. I can continue to change the SQL around and hope for the best but I would prefer to find a more definitive solution.

Any suggestions are appreciated.

Update 2/27/18

The execution plan for the unmodified query shows a Clustered Index Seek against the Addresses table as the problem.

Reducing the length of the columns selected from Addresses to less than 100 bytes

   SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
          v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide, 
          a.Fax

replaced the Clustered Index Seek with a Clustered Index Scan to retrieve a.Fax and a Hash Match to join this value to the results.

The Addresses table primary key is created as follows:

   ALTER TABLE dbo.Addresses 
ADD CONSTRAINT PK_Addresses PRIMARY KEY CLUSTERED (ID ASC)
          WITH (PAD_INDEX = OFF, 
                STATISTICS_NORECOMPUTE = OFF, 
                SORT_IN_TEMPDB         = OFF, 
                IGNORE_DUP_KEY         = OFF, 
                ONLINE                 = OFF, 
                ALLOW_ROW_LOCKS        = ON, 
                ALLOW_PAGE_LOCKS       = ON) 
            ON  PRIMARY

This index is defragged and optimized, as needed, every day.

So far, I can find nothing helpful as to why the Clustered Index Seek adds so much time to the query.

Have you looked at the execution plan to see if there were any obvious issues? — Ken White, Feb 23 '19 at 01:45
Yes, and I have worked with exec plans many times. This case does not show anything obvious. I'm currently looking at a diff comparison between the original and the modification to read only `a.Fax` but, again, there is nothing obvious. Any suggestions on how to analyze the plans? — sthames42, Feb 25 '19 at 17:31
I have updated this post to include analysis of the execution plan. — sthames42, Feb 27 '19 at 22:41

score 1 · Accepted Answer · answered Mar 07 '19 at 19:08

Ok, as is so often the case, there was not one problem, but two problems. This is an example of where complex problem analysis can lead to the wrong conclusions.

The primary problem turned out to be the recursive CTE og which returns a pivot table giving the parent/child relationships between organizations. However, analysis of the execution plans appeared to indicate the problem was some kind of glitch in the optimizer related to the amount of data being returned from a left-joined table. This may be entirely the result of my inability to properly analyze an execution plan but there does appear to be some issue in how SQL Server 2012 SP4 creates an execution plan under these circumstances.

While far more significant on our production server, the problem with SQL Server's optimization of recursive CTE was apparent on both my localhost, running 2012 SP4, and staging server, running SP2. But it took further analysis and some guesswork to see it.

The Solution

I replaced the recursive CTE with a pivot table and added a trigger to the Organizations table to maintain it.

USE Common
GO

CREATE VIEW com.OrganizationGroupsCTE
AS
  WITH cte (OrgID, GroupID) AS
    (
    SELECT ID, ID FROM com.Organizations WHERE ISNULL(ParentID, 0) <> ID 
    UNION ALL
    SELECT o.ID, cte.GroupID FROM com.Organizations o JOIN cte ON cte.OrgID = o.ParentID
    )

  SELECT OrgID, GroupID FROM cte
GO

CREATE TABLE com.OrganizationGroups 
  (
  OrgID   BIGINT, 
  GroupID BIGINT
  )

INSERT com.OrganizationGroups 
SELECT OrgID, GroupID 
  FROM com.OrganizationGroupsCTE
GO

CREATE TRIGGER TR_OrganizationGroups ON com.Organizations AFTER INSERT,UPDATE,DELETE
AS
  DELETE og 
    FROM com.OrganizationGroups og 
    JOIN deleted                d   ON d.ID IN (og.groupID, og.orgID);

  INSERT com.OrganizationGroups 
  SELECT orgID, groupID 
    FROM inserted               i 
    JOIN OrganizationGroupsCTE  cte ON i.ID IN (cte.orgID, cte.groupID)
GO

After modifying the query to use the pivot table,

   SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
          v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide, 
          a.*
     FROM Common.com.OrganizationGroups og
     JOIN books.Organizations           bo ON  bo.CommonID = og.OrgID
     JOIN books.Organizations           po ON  po.CommonID = og.GroupID
     JOIN books.Entities                e  ON   e.OrgID    = po.ID
     JOIN Vendors                       v  ON   v.ID       =  e.ID 
                                           AND (e.OrgID    = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses                     a  ON   a.ID       =  e.AddressID
    WHERE bo.ID = @OrgID
      AND (@ActiveOnly      = 0 OR e.Active = 1)
      AND (@RestrictToOrgID = 0 OR e.OrgID  = @OrgID)
 ORDER BY e.EntityName

SQL Server performance was improved, and consistent, in all three environments. Problems on the production server have now been eliminated.

How to diagnose slow/inconsistent SQL Server query?

1 Answers1