0

So, we're using a postgres database as pretty much our entire backend. We have lots of check constraints on the various columns to make sure that any data entered is valid.

However, what about checks on primary keys? Does postgres or the SQL standard in general impose any particular conditions automatically? Seems to me that setting a SERIAL PRIMARY KEY only has the constraint that values must be unique, but there is at least the general principle that IDs are greater than zero. This could be covered with a constraint like

ALTER TABLE table_name ADD CONSTRAINT check_id_positive CHECK ( id > 0 );

which makes sense in theory but seems a little like overkill to me. Any guidance? Again, like I say, we want to really max out the data validation as much as we can given the use case here. I'm just not sure if this is totally crazy and excessive or not.

RonYitti
  • 3
  • 1
  • If your primary key column is also an auto increment column, then wouldn't the database be handling it already without any need for check constraints? – Tim Biegeleisen Jul 11 '18 at 11:32
  • The actual value of a generated primary key is completely irrelevant. There is no implied (or required) restriction on positive numbers. `-365464` is just as good as `42` as a generated primary key value. –  Jul 11 '18 at 11:33
  • 1
    `general principle that IDs are greater than zero.` This is not needed. It is fine for ids to be negative or zero. But it *could* be part of a (senseless) *business-rule* – wildplasser Jul 11 '18 at 11:34
  • @TimBiegeleisen yes, but front-end team will have access to insert operations on tables and could in theory specify some negative ID. Presumably they wouldn't but I figure why not make it as bulletproof as possible. I could also restrict them from specifying ID, though for the in-between layer we use for this it would be a bit less elegant. – RonYitti Jul 11 '18 at 11:36
  • @a_horse_with_no_name (and wildplasser) - Hm. Well, I always just took it as common practice/expectation that IDs will be positive. I suppose there really may be no issue with negative IDs apart from my personal sense of irrational 'ickyness' about them. – RonYitti Jul 11 '18 at 11:38
  • 1
    Well,they can specify a negative Id. They can supply a positive Id. They can supply a zero Id. In most cases, they should do neither. – wildplasser Jul 11 '18 at 11:39

2 Answers2

1

Generally, you should enforce as many relevant rules as possible in the database, otherwise they will be violated. Enforcing integrity with declarative constraints in the database is far more simple, reliable and efficient than doing it procedurally in some application or UI layer. It can even make queries faster, depending on how adept your DMBS is at semantic optimization.

More specifically, you seem to be asking about constraints on surrogate keys, not necessarily primary keys. Whether a key is primary or not has little relevance for anything at all, so I suggest you edit your question.

Speaking generally again, constraints on keys (or key columns) are of course valid and useful. Natural keys often have a specific format which should be enforced in the database, preferably as a type constraint, if your DBMS supports user-defined types.

But a surrogate key usually has no meaning or structure in and of itself—its only requirement is that its values be unique. There isn't even any logical reason for it to be an integer; it could be anything at all, though integers are often chosen for practical reasons. Thus, for a "pure" surrogate key, there are by definition no rules to enforce, apart from uniqueness and the trivial type constraint, which your DBMS already handles.

Your reasons for wanting to enforce positive-valued ids seem to be aesthetic, not based on real-world requirements. Would it actually be an error (an inconsistency) if you encountered a zero- or negative-valued id? If so, by all means add the constraint (but document the need for it). If not, don't.

One last thing: If a surrogate key value is ever entered manually (e.g. in some UI, or in ad-hoc queries), auto-incrementing integer surrogates are dangerous. It only takes a trivial typing mistake—omission, addition, transposition—to identify the wrong tuple. In such cases, your surrogate id should include one or more check digits/characters to catch most (though of course not all) such mistakes; and the validity of these ids should definitely be checked by a constraint.

Jon Heggland
  • 274
  • 1
  • 7
0

You're asking for guidance, and you're not providing many ways to evaluate the recommendation, so this is as much opinion as anything else...

SQL (and postgres) require nothing more from a primary key than that it is unique. Common practice requires it to be immutable.

Adding additional constraints beyond uniqueness to primary keys is a bad idea - it embeds business rules (which are inherently subject to change) into the very schema of your database. This is why most people recommend automatically generated surrogate keys - there's less risk of input error (oh, I switched two digits in your social security number), or business change (we now allow two people to share the same social security number because we have people from abroad in the system, and cannot guarantee that a Canadian social security number is unique from a US social security number).

Adding constraints because something feels "icky" seems unreasonable - it makes your database harder to use for people who don't share your particular sense of ick - those poor frontend developers trying to work out why a perfectly reasonable looking insert statement fails now have to read even more of the manual.

There is a small performance penalty on using constraints - this shouldn't be a huge concern, but still worth noting.

Finally, as a general rule, I favour enforcing "relational" logic via database mechanisms - not null, unique, can be found in foreign key tables - but to leave business logic outside the database (must be > 0, should contain at least 1 number and one special character, must conform to a regex).

Business rules tend to be much more volatile than we ever believe, and a database schema change is more risky than changing code outside the database.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • Excellent answer. In *some* cases, it can be handy to imply business rules on (components of) keys. Best practice in that case is IMHO to use DOMAINs, and refer to these. See: https://stackoverflow.com/a/10541467/905902 – wildplasser Jul 11 '18 at 13:23
  • This is very dangerous advice. How do you distinguish between "relational" and "business" logic? How are surrogate keys less prone to input error than natural keys? Are you claiming that enforcing integrity outside the database is less costly that doing it inside, or just that there is a performance penalty to enforcing integrity as opposed to _not_ enforcing integrity? Are you seriously recommending not declaring social security number unique? Why is it a problem that rules encoded in the database is subject to change? – Jon Heggland Jul 12 '18 at 18:23