In SQL For Smarties, Joe Celko provides an ANSI SQL definition of a Series table (elsewhere called Tally or Numbers). His definition ensures the values in the column are unique, positive and contiguous from 1 up to the maximum value:
CREATE TABLE Series (
seq INTEGER NOT NULL PRIMARY KEY,
CONSTRAINT non_negative_nbr CHECK (seq > 0),
CONSTRAINT numbers_are_complete CHECK ((SELECT COUNT(*) FROM Series) = (SELECT MAX(seq) FROM Series))
);
Uniqueness is ensured by the PRIMARY KEY declaration. Positivity is ensured by the constraint non_negative_nbr
. With these two constraints in place, contiguity is ensured by the constraint numbers_are_complete
.
SQL Server does not support subqueries in check constraints. When I try to create the Series table, I receive an error like this:
Msg 1046, Level 15, State 1, Line 4
Subqueries are not allowed in this context. Only scalar expressions are allowed.
Msg 102, Level 15, State 1, Line 4
Incorrect syntax near ')'.
If I remove the unsupported constraint numbers_are_complete
, I'm left with this definition:
CREATE TABLE Series (
seq INTEGER NOT NULL PRIMARY KEY,
CONSTRAINT non_negative_nbr CHECK (seq > 0)
);
When I try to create this version of Series, it succeeds:
Command(s) completed successfully.
This version of Series is weaker because it doesn't enforce contiguity of the numbers in the table.
To demonstrate this, first I have to populate the table. I have adapted a technique described by Itzik Ben-Gan in his article 'Virtual Auxiliary Table of Numbers' to do this efficiently for 65,536 rows:
WITH
N0(_) AS (SELECT NULL UNION ALL SELECT NULL),
N1(_) AS (SELECT NULL FROM N0 AS L CROSS JOIN N0 AS R),
N2(_) AS (SELECT NULL FROM N1 AS L CROSS JOIN N1 AS R),
N3(_) AS (SELECT NULL FROM N2 AS L CROSS JOIN N2 AS R),
N4(_) AS (SELECT NULL FROM N3 AS L CROSS JOIN N3 AS R)
INSERT INTO Series (
seq
)
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
FROM N4;
The query produces output like this:
(65536 row(s) affected)
Now I can select from the table like this to produce 65,536 rows:
SELECT seq
FROM Series;
I've truncated the result set, but it looks like this:
seq
1
2
...
65535
65536
Check it for yourself, and you'll see that every number in the interval [1, 65536] is in the result set. The series is contiguous.
But I can break contiguity by deleting any row that is not an endpoint of the range:
DELETE FROM Series
WHERE seq = 25788;
If contiguity were enforced, this statement would raise an error, but instead it succeeds:
(1 row(s) affected)
It would be difficult for a human to find the missing value by visual inspection. They would have to suspect that a value were missing in the first place before going to the trouble. For these reasons, tampering with the Series data is an easy way to introduce subtle bugs into an SQL Server application that relies on the Series table being contiguous.
Assume a user has written a query that reads from Sequence to enumerate rows from another source. After my tampering, that query would now produce incorrect results around a certain value - by the 25,788th row, everything is off by one.
It is possible to write a query to detect missing values in the Series table, but how do I constrain the table so that missing values are impossble?