I have a query that creates a sorted dictionary (sorted in that there is an incremental id
to identify relative position of the keys).
I then wish to know, for each row, if the value
exists as a key
in any other row later on in the dictionary. I'm doing that with a correlated query in a CROSS APPLY
. Effectively a self join on a CTE.
As far as I understand it, this means that the CTE that represents the Dictionary will have to be computed twice?
Other than using table variables (it's inside a function), does anyone have any alternative suggestions?
WITH
dictionary([id], [key], [val]) AS
(
SELECT 1, 'a', 'b'
UNION ALL SELECT 2, 'b', 'c'
UNION ALL SELECT 3, 'c', 'a'
UNION ALL SELECT 4, 'x', 'w'
UNION ALL SELECT 5, 'y', 'x'
UNION ALL SELECT 6, 'z', 'y'
)
SELECT
*
FROM
dictionary dict
CROSS APPLY
(
SELECT COUNT(*) FROM dictionary WHERE dictionary.id > dict.id AND dictionary.[key] = dict.[val]
)
lookup(hits)
CROSS APPLY
(
SELECT 1, 3 WHERE lookup.hits = 0
UNION ALL
SELECT 1, 2 WHERE lookup.hits > 0
UNION ALL
SELECT 2, 3 WHERE lookup.hits > 0
)
map([from], [to])
-- [key]s 'c', 'x', 'y' and 'z' should only have one output rows
-- It's "acceptable" for only 'z' to have just one output row IFF a self join can be avoided
Other options that I can think of are all variants on the self-join...
dictionary dict
LEFT JOIN
(
SELECT key, MAX(id) AS id FROM dictionary GROUP BY key
)
lookup
ON lookup.key = dict.value
AND lookup.id > dict.id
Or...
dictionary dict
OUTER APPLY
(
SELECT 1 WHERE EXISTS (SELECT * FROM dictionary WHERE dictionary.id > dict.id AND dictionary.key = dict.value)
)
lookup(hits)
I am, however, trying to avoid the self join of a CTE, possibly with windowed functions the I haven't thought of? Anything just to avoid the CTE being computed twice...
(Ignoring the lookup.id > dict.id
aspect is fine, if it means avoiding the self join...)
EDIT: More complete example added, and a SQL Fiddle too, thanks to @MartinSmith pointing out some inconsistencies...
http://sqlfiddle.com/#!6/9eecb7db59d16c80417c72d1e1f4fbf1/17407