1

In my project I came across a challenge with below T-SQL code.

  1. step1 populates the UserModules table with parent modules and its subscribed users
  2. step2 checks for child modules associated to modules in step1 in Modules_Hierarchy table and inserts valid records into UserModules tables by mapping child modules with parent modules subscribed users. This step would repeats recursively until all child modules found.

Problem:

In step2, WHILE loop and SELECT statement uses correlated subquery and also the table UserModules is part of both INSERT and associated SELECT Clause which is hampering the performance and frequently the query failing with below LOCK escalation issue.

The final data size in ModulesUsers table is 42 million and its expected to grow.

Error Message: “The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.”

How to optimize this query i.e. step2 to resolve the issue?

Step1:

INSERT INTO UserModules(ModuleID, UserID)
  SELECT ModuleID, UserID
  FROM TABLEA a
  INNER JOIN TABLEB b ON a.ID = b.ID

Step2:

DECLARE @cnt int
SET @cnt = 1

WHILE( @cnt > 0 )      
BEGIN      

  SET @cnt = (SELECT COUNT(DISTINCT s.moduleid)
              FROM Modules_Hirarchy s WITH (nolock), Modules t      
              WHERE s.ParentModuleId = t.ModuleId      
              ------------      
                AND NOT EXISTS       
                 (SELECT ModuleId + EndUserId 
                  FROM UserModules  r      
                  WHERE s.moduleid = r.moduleid 
                    AND t.EndUserId = r.EndUserId)
                AND s.moduleid + t.EndUserId NOT IN 
                  (SELECT CAST(ModuleId AS varchar) + EndUserId 
                   FROM UserModules ))      

  IF @cnt = 0      
    BREAK      

  INSERT INTO UserModules (ModuleId, EndUserId)      
    SELECT DISTINCT s.moduleid, t.EndUserId       
    FROM Modules_Hirarchy s WITH (nolock), UserModules  t      
    WHERE s.ParentModuleId = t.ModuleId      
      AND NOT EXISTS       
       (SELECT ModuleId + EndUserId 
        FROM UserModules  r      
        WHERE s.moduleid = r.moduleid 
          AND t.EndUserId = r.EndUserId)

END  
James Z
  • 12,209
  • 10
  • 24
  • 44
  • 3
    Any reasonable modern version of SQL Server (>= 2005) has [CTEs](http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx) that are designed to do this kind of recursion without you having to write explicit looping code. – Damien_The_Unbeliever Jan 22 '13 at 14:57
  • post some sample data please – WKordos Jan 22 '13 at 15:02

1 Answers1

0

some data to play with

create table #UserModules(ModuleID int, UserID int)

create table #Modules_Hirarchy(ParentModuleID int, ChildModuleID int)

insert into #UserModules (ModuleID , UserID)
values(1,1)
,(2,1)
,(3,1)
,(4,1)
,(5,1)
,(6,2)
,(7,2)

insert into #Modules_Hirarchy(ParentModuleID , ChildModuleID )
values (null,1)
,(1,2)
,(2,3)
,(3,4)
,(3,5)
,(null,6)
,(6,7)

resolution

with cts(ModuleID, UserID,parentModule ) as 
(
select a.ModuleID, a.UserID , CAST(null as int)as parentModule --, cAST(null as int)as b
from #UserModules a join #Modules_Hirarchy  b on a.ModuleID = b.ChildModuleID 
where b.ParentModuleID is null

union all

select b.ChildModuleID as ModuleID, a.UserID, b.ParentModuleID
from cts a join #Modules_Hirarchy b 
on a.ModuleID = b.ParentModuleID

)
select *
into #RESULT
from cts

edit its hard to say : ) to many variables but things you should do to make query efficient

  1. separate non clustered indexes on columns ModuleID ParentModuleID ChildModuleID

  2. you probably dont want to query for all of the groups but only for a explicit ones filter out as many groups as posible in anchor statement

    select a.ModuleID, a.UserID , CAST(null as int)as parentModule from #UserModules a join #Modules_Hirarchy b on a.ModuleID = b.ChildModuleID where b.ParentModuleID is null and a.ModuleId in (listOfModules)

  3. add unique index for columns (ParentModuleID, ChildModuleID) as non unique rows there may lead to huge amount of row duplication

Except on that it depends on data selectivity on the ParentModuleID ChildModuleID, but you cant do much about it

i think it will work fine for big data sets as predicates are simple and as long as data selectivity is high

WKordos
  • 2,167
  • 1
  • 16
  • 15
  • I have question, can this solution be scalled upto huge records i.e. 40plus millions of records. – user2000502 Jan 23 '13 at 09:49
  • Thank you for your suggestions. I tried this solution in my project and its working great but with an exception i.e. some times its failing with the error Transaction Log full on 'MyDatabase'. Please suggest me solution to avoid this problem. – user2000502 Jan 23 '13 at 14:55
  • uh, i dont really know , i think you should post entire transaction as new question, post info how big is your log file etc etc – WKordos Jan 24 '13 at 12:03