TSQL optimize multiple Intersect

Question

The problem query uses multiple Intersect.

Changing it to an in ('alpha','beta','gamma','delta','epsilon','phi') group by having count() = 6 is not an option as the application supports like with wild cards (e.g. alpha%). But the count() = 6 query runs in less than 1 second.

With wild card could use multiple joins and that is how it used to be structured. At 4 or less an Intersect performs better than the multiple joins but unfortunately did not test at 5 or more.

Query performs great with any 4 terms - less than 1 second.
Literally any 4 - first 4, last 4, or middle 4.
On 5 or more then it dies - I killed the query at 2 minutes.
At 6 terms let it run - 5 minutes returning 795 rows.

Up to 4 terms the query plan mixes loop with merge joins.
At 5 or more terms the query plan is all loop joins.
Is there syntax for applying query hints to an Intersect?

Tried two sets of 3 using () () but that did not change the query plan.

  ( -- start term
      select [ftsIndexWordOnce].[sID] 
      from [ftsIndexWordOnce] with (nolock)
      where [ftsIndexWordOnce].[wordID] in ( 
             select [id] from [FTSwordDef] with (nolock) 
                                     where [word] like 'alpha')
  ) -- end term
INTERSECT
  ( -- start term
      select [ftsIndexWordOnce].[sID] 
      from [ftsIndexWordOnce] with (nolock)
      where [ftsIndexWordOnce].[wordID] in ( 
             select [id] from [FTSwordDef] with (nolock) 
                                     where [word] like 'beta')
  ) -- end term
INTERSECT
  ( -- start term
      select [ftsIndexWordOnce].[sID] 
      from [ftsIndexWordOnce] with (nolock)
      where [ftsIndexWordOnce].[wordID] in ( 
             select [id] from [FTSwordDef] with (nolock) 
                                     where [word] like 'gamma')
  ) -- end term
INTERSECT 
  ( -- start term
      select [ftsIndexWordOnce].[sID] 
      from [ftsIndexWordOnce] with (nolock)
      where [ftsIndexWordOnce].[wordID] in ( 
             select [id] from [FTSwordDef] with (nolock) 
                                     where [word] like 'delta')
  ) -- end term
INTERSECT
  ( -- start term
      select [ftsIndexWordOnce].[sID] 
      from [ftsIndexWordOnce] with (nolock)
      where [ftsIndexWordOnce].[wordID] in ( 
             select [id] from [FTSwordDef] with (nolock) 
                                     where [word] like 'epsilon')
  ) -- end term
INTERSECT
  ( -- start term
      select [ftsIndexWordOnce].[sID] 
      from [ftsIndexWordOnce] with (nolock)
      where [ftsIndexWordOnce].[wordID] in ( 
             select [id] from [FTSwordDef] with (nolock) 
                                     where [word] like 'phi')
  ) -- end term

Think I have a fix

     select distinct [ftsIndexWordOnce].[sID] 
      from [ftsIndexWordOnce] with (nolock)
      Inner Merge Join [FTSwordDef] with (nolock) 
        On [FTSwordDef].[ID] = [ftsIndexWordOnce].[wordID] 
       And [FTSwordDef].[word] like 'alpha' 
  INTERSECT
     select distinct [ftsIndexWordOnce].[sID] 
      from [ftsIndexWordOnce] with (nolock)
      Inner Merge Join [FTSwordDef] with (nolock) 
        On [FTSwordDef].[ID] = [ftsIndexWordOnce].[wordID] 
       And [FTSwordDef].[word] like 'beta'

The query optimizer still goes stupid at 5 or more but this forces the first join to be a merge and saves it.

JAQFrost · Answer 1 · 2013-06-10T18:28:47.590

Might want to try "EXISTS". "IN" can get expensive, especially with large lists. "EXISTS" just looks for the first match, whereas "IN" tries to find them all. If sID is unique in ftsIndexWordOnce, the code below should work. If not, you can either add distinct or group on it.

EDIT: first script had logic error. See comments.

SELECT
    [ftsIndexWordOnce].[sID]
FROM
    [ftsIndexWordOnce] WITH (NOLOCK)
WHERE
    EXISTS
    (
        SELECT
            NULL
        FROM
            [FTSwordDef] WITH (NOLOCK)
        WHERE
            [FTSwordDef].[word] LIKE 'alpha'
            AND
            [FTSwordDef].id = [ftsIndexWordOnce].wordid
    )
INTERSECT
SELECT
    [ftsIndexWordOnce].[sID]
FROM
    [ftsIndexWordOnce] WITH (NOLOCK)
WHERE
    EXISTS
    (
        SELECT
            NULL
        FROM
            [FTSwordDef] WITH (NOLOCK)
        WHERE
            [FTSwordDef].[word] LIKE 'beta'
            AND
            [FTSwordDef].id = [ftsIndexWordOnce].wordid
    )
INTERSECT
SELECT
    [ftsIndexWordOnce].[sID]
FROM
    [ftsIndexWordOnce] WITH (NOLOCK)
WHERE
    EXISTS
    (
        SELECT
            NULL
        FROM
            [FTSwordDef] WITH (NOLOCK)
        WHERE
            [FTSwordDef].[word] LIKE 'gamma'
            AND
            [FTSwordDef].id = [ftsIndexWordOnce].wordid
    )
INTERSECT
SELECT
    [ftsIndexWordOnce].[sID]
FROM
    [ftsIndexWordOnce] WITH (NOLOCK)
WHERE
    EXISTS
    (
        SELECT
            NULL
        FROM
            [FTSwordDef] WITH (NOLOCK)
        WHERE
            [FTSwordDef].[word] LIKE 'delta'
            AND
            [FTSwordDef].id = [ftsIndexWordOnce].wordid
    )
INTERSECT
SELECT
    [ftsIndexWordOnce].[sID]
FROM
    [ftsIndexWordOnce] WITH (NOLOCK)
WHERE
    EXISTS
    (
        SELECT
            NULL
        FROM
            [FTSwordDef] WITH (NOLOCK)
        WHERE
            [FTSwordDef].[word] LIKE 'epsilon'
            AND
            [FTSwordDef].id = [ftsIndexWordOnce].wordid
    )
INTERSECT
SELECT
    [ftsIndexWordOnce].[sID]
FROM
    [ftsIndexWordOnce] WITH (NOLOCK)
WHERE
    EXISTS
    (
        SELECT
            NULL
        FROM
            [FTSwordDef] WITH (NOLOCK)
        WHERE
            [FTSwordDef].[word] LIKE 'phi'
            AND
            [FTSwordDef].id = [ftsIndexWordOnce].wordid
    )

Good thought but the AND returns 0. Pretty sure it is evaluating a single wordID at a time and that single wordID cannot be both like 'alpha' and like 'beta'. What do you think? — paparazzo, Jun 10 '13 at 17:46
D'oh, right you are. So back to the intersect, but with EXISTS instead of IN. A join with the intersected bits as subqueries could work too. — JAQFrost, Jun 10 '13 at 18:37
Tried EXISTS with INTERSECT but still did not fix it. At 5 or more it goes all loop joins and dies. But, inner merge join may do the trick. — paparazzo, Jun 10 '13 at 18:48
+1 cause you got me thinking about looking at it from another angle. — paparazzo, Jun 10 '13 at 19:45

score 0 · Answer 2 · answered Sep 02 '18 at 09:03

I've had a very similar problem to yours.

I wanted to know what the common value was for an undefined number of intersects.

I could not use dynamic SQL due to the performance loss.

So I came up with a pattern as follows:

WITH CTE AS (
    SELECT [FTSwordDef].[ID], -- Unique identifier of the referenced entity
           [ftsIndexWordOnce].[sID] -- The field that should be common for the intersect
    FROM [FTSwordDef]
    INNER JOIN [ftsIndexWordOnce]
        ON [FTSwordDef].[ID] = [ftsIndexWordOnce].[wordID]

    -- All your intersects becomes a query that returns all the participants of interest    
    WHERE [FTSwordDef].[word] IN ('alpha','beta','gamma','delta','epsilon','phi')   

    -- Optional GROUP BY if you don't trust the integrity of 
    -- your data and fear duplicate data may be present
    GROUP BY [FTSwordDef].[ID],[ftsIndexWordOnce].[sID]
)

SELECT [MAIN].[sID] -- The common value you wanted with the intersect
FROM CTE [MAIN]

-- We count the amount of participating entities (N Amount of intersect)
CROSS APPLY(
    SELECT COUNT(DISTINCT [A].[ID]) C FROM CTE [A]
) [A]

-- We count the occurrences of the common value
CROSS APPLY(
    SELECT COUNT([B].[sID]) C FROM CTE [B] WHERE [B].[sID] = [MAIN].[sID]
) [B]

-- If the value we want in common has equal occurrences as the amount of 
-- participating referenced entities, we can say the value is common for all
WHERE [A].[C] = [B].[C]
GROUP BY [MAIN].[sID]

Here is a fiddle http://sqlfiddle.com/#!18/2f1d9/55

TSQL optimize multiple Intersect

2 Answers2