Improve slow query when the number of the records doubles

Question

I have a query that select data from a table.

This table records number vary based on users input.

When the number of records is around 60,000 or less it is pretty fast ( less than 2 minutes).

But when I double number of records around 120,000 it takes more than an hour! and THEN I have to kill the process.

I don't know why it is extremely slow.

I added many indexes but its still too slow.

Here is the query

DECLARE @MaxID AS bigint

SELECT @MaxId = MAX(iID) from TagsTemp
    
SELECT 
    RepID, tag, xmiid, ibegin, iend, 
    confidence, polarity, uncertainty, conditional, generic, historyOf,
    codingScheme, code, cui, /*U.tui,*/ preferredText , --ISNULL(tag2, tag3) AS tagValue, 
    ISNULL(ibegin2, ibegin3) AS ibeginValue, ISNULL(iend2, iend3) AS iendValue,
    dbo.RepGetValue(RepID, ibegin, iend, 
    ISNULL(ibegin2, ibegin3), ISNULL(iend2, iend3))
FROM 
    (SELECT DISTINCT 
         T.RepID, dbo.ShortTag(T.tag) AS tag, T.xmiid, 
         CAST(T.ibegin AS bigint) AS ibegin, CAST(T.iend AS bigint) AS iend, 
         T.confidence, T.polarity, T.uncertainty, T.conditional, T.generic, T.historyOf,
         dbo.ShortTag(U.codingScheme) AS codingScheme, U.code, U.cui, /*U.tui, */U.preferredText,
         dbo.ShortTag(L3.tag) AS tag2, L3.ibegin AS ibegin2, L3.iend AS iend2,
         dbo.ShortTag(M1.tag) AS tag3, M1.ibegin AS ibegin3, M1.iend AS iend3, 
         ROW_NUMBER() OVER(PARTITION BY T.RepID, T.tag, T.xmiid,
         CAST(T.ibegin AS bigint), CAST(T.iend AS bigint), 
         T.confidence, T.polarity, T.uncertainty, T.conditional, T.generic, T.historyOf,
         U.codingScheme, U.code, U.cui, /*U.tui,*/ U.preferredText , L3.tag, L3.ibegin, L3.iend 
     ORDER BY 
         T.RepID, T.xmiid, CAST(T.ibegin AS bigint), CAST(T.iend AS bigint), 
         CAST(M1.ibegin AS bigint), CAST(M1.iend AS bigint) DESC, 
         CASE M1.tag 
            WHEN 'textsem:Mandy' THEN 1 
            WHEN 'textsem:Franc' THEN 2 
            WHEN 'textsem:Roger' THEN 3 
            WHEN 'syntax:Numan' THEN 4 
            WHEN 'textsem:Danna' THEN 5 
            WHEN 'textsem:Rami' THEN 6 
         END) AS RowNo
     FROM 
         TagsTemp T 
     INNER JOIN 
         TagsTemp U ON T.RepID = U.RepID 
                    AND T.Tag IN ('textsem:Michael', 'textsem:Simon', 'textsem:Anna','textsem:Evan','textsem:Paul','textsem:Dines','textsem:Larry')
                    AND U.Tag = 'refsem:Usman'
                    AND T.ontologyConceptArr LIKE '%' + CAST(U.xmiid AS varchar(100)) + '%'
     LEFT OUTER JOIN 
         TagsTemp L1 ON T.tag = 'textsem:Larry' 
                     AND L1.tag = 'relation:ResultOfTextRelation'
                     AND T.RepID = L1.RepID 
                     AND T.LabValue = L1.xmiid 
                     AND ISNULL(L1.arg2, '') <> ''
     LEFT OUTER JOIN 
         TagsTemp L2 ON L1.tag = 'relation:ResultOfTextRelation' 
                     AND L2.tag = 'relation:RelationArgument' 
                     AND L1.RepID = L2.RepID 
                     AND L1.arg2 = L2.xmiid 
     LEFT OUTER JOIN 
         TagsTemp L3 ON L2.tag = 'relation:RelationArgument' 
                     AND  L3.tag IN ('syntax:Numan','textsem:Danna','textsem:Roger','textsem:Mandy', 'textsem:Franc','textsem:Rami')
                     AND L2.RepID = L3.RepID 
                     AND L2.argument = L3.xmiid
     LEFT OUTER JOIN 
         TagsTemp M1 ON T.RepID = M1.RepID 
                     AND T.tag IN ('textsem:Michael', 'textsem:Larry') 
                     AND M1.tag IN ('syntax:Numan','textsem:Danna','textsem:Roger', 'textsem:Mandy',/*'textsem:Rami', */ 'textsem:Franc')
                     AND CAST(M1.ibegin AS bigint) > CAST(T.iend AS bigint) 
                     AND CAST(M1.ibegin AS bigint) - CAST(T.iend AS bigint) < 4 
     WHERE 
         T.iID <= @MaxID) X
WHERE
    RowNo =    1
ORDER BY 
    RepID, tag, xmiid, CAST(ibegin AS bigint) , CAST(iend AS bigint) , 
    confidence, polarity, uncertainty, conditional, generic, historyOf,
    codingScheme, code, cui, /*U.tui,*/ preferredText , tag2, ibegin2, iend2,
    tag3,ibegin3, iend3

Here is the execution plan when I run for 60,000

https://www.brentozar.com/pastetheplan/?id=r1XHRQXpD

iID is a clustered Primary Key, for sure this is not the reason — asmgx, Dec 25 '20 at 09:31
Seeing a select distinct is usually a sign of a incorrectly crafted query. And seeing a row-number that is partitioned over many fields that are sometimes cast even, and ordered by a slew of columns... Perhaps you should look back on your requirements and see if they're not over the top. — TT., Dec 25 '20 at 09:32
Well, i suggest to test the big subquery in isolation first. — Tarik, Dec 25 '20 at 09:35
This is ugly: `CASE M1.tag WHEN 'textsem:Mandy' then 1 WHEN 'textsem:Franc' then 2 WHEN 'textsem:Roger' then 3 WHEN 'syntax:Numan' then 4 WHEN 'textsem:Danna' then 5 WHEN 'textsem:Rami' then 6 END)`. Have a small table with the string and numbers and then use it in the join. — Tarik, Dec 25 '20 at 09:40
Your query is so complex that makes it close to unmaintainable. I would put this in a stored procedure and break all of this into manageable pieces. Think about the poor guy who will have decipher this. — Tarik, Dec 25 '20 at 09:43
@Tarik table structure is a copy of an XML files recieved. this query to simplify the table data — asmgx, Dec 25 '20 at 09:49
Well, the aim might be noble, but IMHO, the method is overly complex. You need to break down all of this into manageable pieces that you can test in isolation even from a performance point of view. See my other suggestions. — Tarik, Dec 25 '20 at 09:53
@Tarik I think thats the way to do. I will break it down into more manageble peices — asmgx, Dec 25 '20 at 09:55
Good luck! Also, closely document each step so as to not get lost when you get back to this code 6 months from now. Put plenty of comments explaining why you do this and that. — Tarik, Dec 25 '20 at 09:59
It looks like you are operating with JSON. Maybe you can parse it and then use in your query. That should be definitely faster. — Slava Murygin, Dec 25 '20 at 15:21

score 1 · Answer 1 · answered Dec 25 '20 at 15:38

You have a Key lookup with 37K lookups Try to delete index "[NonClusteredIndex-20201223-150141]" and create another one instead:

CREATE INDEX NCI_TagsTemp_1 ON TagsTemp (iID, RepID, tag)
INCLUDE (xmiid, ibegin, iend, ontologyConceptArr, confidence, polarity, uncertainty, conditional, generic, historyOf, labValue)

score 1 · Accepted Answer · answered Dec 26 '20 at 06:49

I don't have exact answer but I can help you to improve performance

before executing this query run this both statement

SET STATISTICS IO ON 
SET STATISTICS TIME ON

Now run your query and analyze result in message tab.

Use this site for more readability http://statisticsparser.com/

Try to find where physicals read is more, this site to understand physical and logical reads https://vaishaligoilkar3322.medium.com/physical-and-logical-reads-in-sql-server-c6d62e65e359

Now you need find exact block which is taking time and try to break it down.

Try to decrease physical and logical reads and repeat this step until you find the root cause.

This the best way you can improve this query performance.

Hope this helps you.

Improve slow query when the number of the records doubles

2 Answers2