How can I union values across multiple columns of a result set without repeated table scans in the query plan?

Question

I have data in table Foo in columns x1 and x2 and x3, and more data in table Bar in x4. Both tables are primary-keyed by a unique id, and each Bar has a foreign key to a single Foo, as in the schema below:

CREATE TABLE Foo (id INT, x1 INT, x2 INT, x3 INT, ...)
CREATE TABLE Bar (id INT, fooId INT, x4 INT, ...)

CREATE TABLE Qux (x INT, ...)

This is the right schema, and it's properly normalized for its use case.

I need a set of all distinct x values from Foo and Bar where the Foo records match some WHERE clause. I then need to use those values to look up the correct records in another table Qux.

I had solved it with UNION ALL, as in the example below:

WITH CTE_Ids AS (
    SELECT x1 AS x FROM Foo WHERE ...
    UNION ALL SELECT x2 AS x FROM Foo WHERE ...
    UNION ALL SELECT x3 AS x FROM Foo WHERE ...
    UNION ALL SELECT x4 AS x FROM Foo f LEFT OUTER JOIN Bar b ON f.id = b.fooId WHERE ...
),
CTE_UniqueIds AS (
    SELECT DISTINCT x FROM CTE_Ids
)
SELECT q.*
FROM CTE_UniqueIds ids
INNER JOIN Qux q ON ids.x = q.x

This produces the right result set, and I don't mind repeating the WHERE clause in the code — but unfortunately, it's very inefficient, scanning the Foo and Bar tables many times, because SQL Server doesn't realize it could scan the data exactly once. The inefficient query plan is bad enough that we're experiencing significant slowdowns in our production software.

So how can I get the unique set of x values unioned across columns without SQL Server scanning each table multiple times?

you can use VALUES construct to create multiple rows of columns x, x2,x3,x4. The only question is that if you want to inner join Bar for all 4 columns. Something like: select * from foo cross apply (values(x),(x2),(x3)) y(z) — siggemannen, Jun 16 '23 at 15:38
@siggemannen I spent some time searching for a good solution to this, and after finding it, posted both this question-and-answer so others wouldn't have to do the same research. Abusing `OUTER APPLY` and `UNION ALL` to union horizontally does exactly what you want, and it scales well to more complex versions of the same problem (like I _actually_ have in real life). — Sean Werkema, Jun 16 '23 at 15:42

Sean Werkema · Answer 1 · 2023-06-16T15:48:38.513

I puzzled over this for a while: It seemed like there ought to be a way to be able to simply write

SELECT
    f.x1, f.x2, f.x3, b.x4
FROM Foo f
    LEFT OUTER JOIN Bar b ON b.fooId = b.id
WHERE ...

and then somehow tell SQL Server to union all of the X columns across each resulting row into a unique result set of X values. I searched awhile, and I eventually found part of a solution to it in @MatBailie's solution for another question, and I expanded his solution to the answer below.

The key to unioning horizontally across columns is by abusing the OUTER APPLY operator, which can produce multiple result rows for each input row, and using it and UNION ALL to combine each result row with itself multiple times:

SELECT u.x
FROM Foo f
    LEFT OUTER JOIN Bar b ON b.fooId = b.id
    OUTER APPLY (
        SELECT f.x1 AS x
        UNION ALL SELECT f.x2 AS x
        UNION ALL SELECT f.x3 AS x
        UNION ALL SELECT b.x4 AS x
    ) AS u
WHERE ...
GROUP BY u.x

You can use DISTINCT at the top or GROUP BY at the bottom (I prefer GROUP BY, as SQL Server can sometimes optimize that better) to produce the unique set of x values if there are duplicates.

The full query would be structured something like this:

WITH CTE_Ids AS (
    SELECT u.x
    FROM Foo f
        LEFT OUTER JOIN Bar b ON b.fooId = b.id
        OUTER APPLY (
            SELECT f.x1 AS x
            UNION ALL SELECT f.x2 AS x
            UNION ALL SELECT f.x3 AS x
            UNION ALL SELECT b.x4 AS x
        ) AS u
    WHERE ...
    GROUP BY u.x
)
SELECT q.*
FROM CTE_Ids ids
INNER JOIN Qux q ON ids.x = q.x

The query plan for the above will only scan each of the correct Foo and Bar records exactly once, and then just perform some sorting and filtering in memory on the result, before then joining the unique set of resulting x values to Qux.

And if the original `WHERE` conditions are column-specific, those conditions can still be applied within the `OUTER APPLY (...)`. (Might need to change the `OUTER APPLY` be a `CROSS APPLY` depending on how you want to handle the no-value case.) — T N, Jun 16 '23 at 15:41

siggemannen · Answer 2 · 2023-06-16T16:13:36.747

0

A little example of the values construct:

declare @foo table (id int primary key, x1 int, x2 int, x3 int)
declare @bar table (id int primary key, fooid int, x4 int)

insert into @foo values(1, 10,20,30)
,   (2, 5, 10, 40)
,   (3, 1, 1, 1)
,   (4, 10,20,20)

insert into @bar
values  (1, 1, 100)
,   (2, 1, 1337)
,   (3, 2, 66)
,   (4, 3, 1000)
,   (5, 4, 1)


select distinct z.x1
from @foo f
left join @bar b
    ON  b.fooid = f.id
cross apply (
    values(f.x1), (f.x2), (f.x3), (b.x4)
    ) z
where z.x1 between 5 and 60

I think it's often much more terser than UNION ALL. But both do the job indeed!

edited Jun 16 '23 at 16:13

answered Jun 16 '23 at 15:46

siggemannen

3,884
2
6
24

That is a good solution as well, and it's a shame how poorly documented the `VALUES` clause is. The `UNION ALL` solution has the benefit that it has no size caps, while `VALUES` is limited to 1000 columns. But it looks like they both produce the exact same query plan. – Sean Werkema Jun 16 '23 at 15:58
You have a syntax error in your answer: You need to write `cross apply (...) z(x)` to have the column-rename work correctly, or `z.x` will be an unbound identifier. – Sean Werkema Jun 16 '23 at 15:59
nope, it works without it if f.x is a "column" name and not an expression – siggemannen Jun 16 '23 at 16:10
The original column names in the question were `x1`, `x2`, `x3`, and `x4`. There was no `x`, so a rename is necessary to get them all to line up. – Sean Werkema Jun 16 '23 at 16:12
yeah sorry, i have a bad habit of never naming the "first column" by number :) Edited the answer to match your cols – siggemannen Jun 16 '23 at 16:14
In the original problem that inspired the question, the column names don't have numbers: They're effectively `flarbId` and `grobId` and `nitzId`, and have nothing in common other than that they're all IDs. I included a rename to `x` (to `id`) in my answer because someone searching for the question likely has a similar problem, and needs the rename just like I needed it. – Sean Werkema Jun 16 '23 at 16:20

How can I union values across multiple columns of a result set without repeated table scans in the query plan?

2 Answers2