SQL Performance IN vs EQUAL vs UNION - revisited

Question

Some years ago Performance differences between equal (=) and IN with one literal value there was discussion around IN and EQUAL for a single value.

My issue is where there is a (small) list of values for a select. The use case is to load data into an analytics model for a selection of 'instances'.

  SELECT *
  FROM [v_location_fact]
  where instanceID = '136'

This returns 22,425 rows in 1.4 seconds.

The execution plan:https://www.brentozar.com/pastetheplan/?id=r1ouDwny9

The same result and query works for instanceID = '137'

SELECT *
  FROM [v_location_fact]
  where instanceID = '136'
  UNION ALL
SELECT *
  FROM [v_location_fact]
  where instanceID = '137'

Works as expected and returns the combined set 45167 rows in 2.5 seconds. This uses essentially the same plan as above, but twice.

But

SELECT *
  FROM [v_Location_fact]
  where instanceID in ('136', '137')

Takes 33 seconds to return the same data.

The execution plan: https://www.brentozar.com/pastetheplan/?id=SJ9OOw3Jc

With inner join:

declare @ids table (selID varchar(20));
insert into @ids(selID) values ('136'),('137');
SELECT *
  FROM [v_Location_fact]
  INNER JOIN @ids ON instanceID = selID

also takes 30 seconds to return the same data. Execution plan: https://www.brentozar.com/pastetheplan/?id=BJ4uFw2yq

Each table used by the view v_location_fact has instanceID as the first element of an index.

In our use case, there may be up to 10 instances in the list of selections so the UNION syntax doesn't work from PowerQuery as this would require a variable number of UNION clauses.

Any suggestions how the 'straight' line performance of the UNION (2.5 seconds) can be emulated with a IN or JOIN clause with the small number of instances? (In this example taking 12 times longer, but in more real-world examples taking hundreds of times longer where the tables are much larger).

Have you reviewed the execution plan to see where the time is being consumed? Add the plans to your question using [PasteThePlan](https://www.brentozar.com/pastetheplan) — Stu, Feb 17 '22 at 23:52
Without looking in any detail I can see the majority of the cost is due to many tables being heaps with no clustered index and expensive RID lookups, some index tuning will likely yield considerable improvement. — Stu, Feb 18 '22 at 00:40
The Plan can't match with your query, There is a lot of `Key Lookup` that might cause a lot of IO from your plan, could you provide more detail (index & real query ), Is `v_Location_fact` a view? — D-Shih, Feb 18 '22 at 00:41
@D-Shih, yes, v_location_fact is a complex view, but all tables involved have instanceID as the first element of a key and all joins use matched index keys. Cardinalities for 5 tables are > 100,000 rows. There are 25 distinct values for instanceID. — sch56, Feb 18 '22 at 01:25
@Stu - Having reviewed all the indexes, they are all precisely fit for purpose for this query. Adding clustered indexes makes no difference. The key point is that the '=' query (on its own or with the UNION ALL) is efficient, but the other approaches take more than 12 times as long to return the same data. There appear to be 3 cumulative contributions: 1. First plan makes much better use of parallel threads (CPU time > 3 * elapsed time), 2. Number of Logical reads in the first case is about 40% of the other two, 3. Table Spool nodes in the latter plans appear to add a significant overhead. — sch56, Feb 18 '22 at 03:16
so why an ID has to be an string???? maybe that's your problem — AbbathCL, Feb 18 '22 at 03:30

score 0 · Answer 1 · answered Feb 21 '22 at 00:19

I resorted to query hints. Specifying 'left hash join' for two sub-queries has given a result that is much more linear in time with the individual instance queries.

e.g.

left hash join (
    select instanceID, loc, product, period, sum(receipts) K13_IntProcReceipt
    from xopt_ProcRec_out
    group by instanceID, loc, product, period) pro
    on pro.instanceID = il.instanceID
    and pro.loc = il.loc
    and pro.product = il.product
    and pro.period = il.period

Thanks for all the input.

SQL Performance IN vs EQUAL vs UNION - revisited

1 Answers1