Acceptable way to get around a UNIQUE w NULL constraint?

Question

I have a table as follows:

I would like to add a UNIQUE KEY on (connection_id, parent_container_id, name). However, parent_container_id and connection_id are a disjoin union -- the record must have one or the other. Because of this, I thought perhaps using the value 1 as the 'null' value (that is, the first entry when using an auto-incrementing ID) and creating a BASE entry in the container table for the 1 record.

Example data that I want to enforce uniqueness on:

(connection_id=1, parent_container=null, name="hello")
(connection_id=1, parent_container=null, name="hello") # should fail
(connection_id=null, parent_container=10, name="goodbye")
(connection_id=null, parent_container=10, name="goodbye") # should fail

Is this a poor approach to use or does this seem like a good way to enforce the Unique-ness constraint here? If it's not a good way, what might be a better way?

Update: my current solution is now using a generated (virtual) column with an md5 hash:

ALTER TABLE container ADD unique_hash2 CHAR(32) GENERATED ALWAYS AS 
(MD5(CONCAT(COALESCE(connection_id, '-1'), COALESCE(parent_container_id, '-1'), name))) VIRTUAL UNIQUE

Does MySQL allow a *unique index* (different from a *unique key*) to have NULL key values? — user2864740, Jan 09 '22 at 08:36
@user2864740 well it 'allows it' but if the value is ever null the unique constraint is effectively turned off. — David542, Jan 09 '22 at 08:37
I've used negative id values for placeholders. Such as None, Not Applicable, Not Specified, etc. That matches your suggestion of using 1 to associate to NULL. I dislike having hashes in keys or constraints, their random order and relatively large size often lead to undesirable side effects. — MatBailie, Jan 09 '22 at 10:29
You could also have the parent be "itself" rather than NULL. — MatBailie, Jan 09 '22 at 10:33
"Is this a poor approach to use ...?" Yes. It suggests you haven't normalised the table design properly. It's quite common to need nullable `parent_xxx` columns (because not every entity has a parent) -- but in that case, the `parent_xxx` is not part of a key. So how can you have `container`s with the same `connection_id, name` but differing (non-null) `parent_container_id`? Could you have same `connection_id, name` but one `container` with a `parent_container_id`, another with `null`? What would be the business scenario there? — AntC, Jan 09 '22 at 21:31
This is not clear. How is "may be NULL" a problem? What does "using the value 1 as the 'null' value" mean? " a poor approach to use" to accomplish what why? Why do you think UNIQUE KEY doesn't enforce uniqueness? What exactly do you mean by "uniqueness"--it has a SQL technical meaning where 2 NULLs are each considered unique. Please [use text, not images/links, for text--including tables & ERDs](https://meta.stackoverflow.com/q/285551/3404097). Use images only for what cannot be expressed as text or to augment text. Include a legend/key & explanation with an image. — philipxy, Jan 09 '22 at 21:33
@AntC thanks for the response. `parent_id` and `connection_id` are a union type, and the data must have either one or the other. — David542, Jan 09 '22 at 22:40
@philipxy thanks for the feedback -- yes I mean I would essentially want `name,(connection_id | parent_id)` to be the unique key in this table, where `(connection_id|parent_id)` is a tagged union. I also added a few examples of how I want the constraints to be enforced in the question. — David542, Jan 09 '22 at 22:46
"union type, and the data must have either one or the other". Ok that's the shibboleth for failing to normalise. Don't do that. Vertically partition into two (or more) tables. — AntC, Jan 09 '22 at 23:24
Your edits haven't resolved the issues in my comments. PS It's hard to see how when clear this isn't going to be a duplicate of the many many Q&A re many DB designs for hierarchical data and recursively querying hierarchies of indefinite depth, since these address relational designs for hierarchies. Your recent saying you've seen some notwithstanding. See the [help], [meta] & [meta.se] re disagreeing with others' claims of questions being duplicates. PS 1st thing you say is you want UNIQUE KEY then immediately say (unclearly) you don't. You have things in mind you don't say throughout. — philipxy, Jan 10 '22 at 00:08
@AntC the issue is that I have a FK of a child table that can link to that. So if I split it into two tables, the child table would have another problem of having to link to two different tables. — David542, Jan 10 '22 at 00:29
OK so you have denormalised data all over the schema. Don't do that. All you've done is paint yourself into a corner before coming to ask how to get out across the wet paint. As @philipxy says, this is turning into a DB/schema design question. Then you're not presenting enough info to answer. — AntC, Jan 10 '22 at 00:45
"FK" [sic] to multiple tables & radio button FKs to multiple tables are also faqs & also typically anti-patterns for representing subtypes. Your recent rejecting relevance of such Q&A again notwithstanding. (Your recent questions seem to involve both subtypes & hierarchies (and of course a hierarchy implicitly defines subtypes).) Suggest 1. identify & describe the basic hierarchies & typing involved as graphs of nodes & edges & 2. follow a published presentation of an information modelling & DB design method & ask 1 specific researched non-duplicate question re where you are 1st stuck. — philipxy, Jan 10 '22 at 00:53

score 5 · Accepted Answer · edited Jan 30 '22 at 13:28

Relational Solution

Responding to the relational-database tag, in that context. SQL is the data sublanguage defined by Codd in his Relational Model, that was established as a Standard in the 1980's. Thus SQL is Relational; Relational is SQL.

All items in this answer are Relational and can be implemented easily in SQL, we have had them for decades.

Is this a poor approach to use or does this seem like a good way to enforce the Unique-ness constraint here? If it's not a good way, what might be a better way?

Yes, it is poor, on two counts. First, you are trying to do too many things in one file. Second, it is not Relational, and there is a clean; logical way to do that within the Relational paradigm: Therefore your intended addition is problematic.

this is not a database design question, but strictly answering what is wrong, with an explanation.

Container
container_id; name and parent_container_id indicate a simple, single-parent hierarchy. That is one Fact.
- since you have more than one data tree or hierarchy, each of which has a Root, you need an Anchor row with container_id 0, that the roots are children of, and to inform the recursive Function that constructs the Path or PathName when to stop. This is an Anchor row, it means something, it is not a fiddle to avoid Null.
- the AK ( parent_container_id, name ) is required to prevent duplicate names within a parent_container_id. That is one Predicate. Do not double up and try to do more with it, add another Predicate instead.
Connection
connection_id is a separate discrete Fact. I presume it applies only to a Root. So this is a Subordinate Fact, which is optional to [1]. Its integrity is maintained by a constraint that calls a Function (same as in the Subtype doc) that checks the parent is a Root.
- Null is a red flag indicating incomplete Normalisation, if you Normalise the data, you will not store a Null. (Nulls in result sets are allowed, of course.)
- Separately, as a rule, since Null is the unknown, the notion of a Key (UNIQUE) that is unknown, is hysterical. That is why the Standard prohibits it.
"Union Type"
In the Relational paradigm, the concept of "union type" is completely bankrupt. It is a method used by academics, who do not understand the Relational Model or SQL, who teach and promote 1960's Record Filing Systems and database systems that do not implement the SQL standard correctly. Since those database systems do not support SQL (Constraint that calls a Function), the method is an alternative that is possible: an additional index in the parent and an additional Foreign Key in the child. Horrendous at every level. Do not use it as a generic method.
- note that the additional index is perverse in the index sense (container_id is already unique; adding something to an unique field does not make it more unique), its only purpose is to allow a Foreign Key in the child, that in turn is checked for some condition (here, that the parent is a Root).
- obviously, if your "SQL" isn't SQL, use the pretend "SQL" method
Disjunction
To implement an OR or XOR Gate (formal disjunction), use a proper Subtype cluster (instead of overloading the already poor file that you have).

Data Model

foo

This model provides a relational-database solution, which implies SQL, and therefore proper Constraints that can call a Function.

It is impaired because the "Primary Key" is not a Key but a physical RecordId, that does not have the properties of a logical Key. The consequence will be noticed in child tables.

foo

This model provides a solution that is possible in MySQL.

the typical 1960's pre-Relational, pre-DBMS, physical Record Filing System, that is marketed by the academics as "relational",
with the additional index and FKs to substitute for SQL Constraints, for use in the databases that do not implement SQL correctly

Notation

All my data models are rendered in IDEF1X, available from the early 1980's, the one and only Standard for Relational Data Modelling 1993.
The IDEF1X Introduction is essential reading for those who are new to Codd's Relational Model, or its modelling method. Note that IDEF1X models are complete, they are rich in detail and precision, showing all required details, whereas a home-grown model, being unaware of the imperatives of the Standard, have far less definition. Which means, the notation needs to be fully understood.

Lukasz Szozda · Answer 2 · 2022-01-15T12:33:29.233

Enforcing such requirements with:

two unique indexes
check constraint

Table definition:

CREATE TABLE tab(
   container_id          INT PRIMARY KEY AUTO_INCREMENT,
   connection_id         INT,
   parent_container_id   INT,
   name          VARCHAR(100),
   UNIQUE INDEX (connection_id, name),
   UNIQUE INDEX (parent_container_id, name),
   CHECK ((connection_id IS NULL AND parent_container_id IS NOT NULL)
           OR
          (connection_id IS NOT NULL AND parent_container_id IS NULL))
);

db<>fiddle demo

Data test:

INSERT INTO tab(connection_id, parent_container_id, name)
VALUES (1, NULL, 'hello');

INSERT INTO tab(connection_id, parent_container_id, name)
VALUES (1, NULL, 'hello');
-- Duplicate entry '1-hello' for key 'tab.connection_id'


INSERT INTO tab(connection_id, parent_container_id, name)
VALUES (NULL, 10, 'goodbay');

INSERT INTO tab(connection_id, parent_container_id, name) 
VALUES (NULL, 10, 'goodbay');
-- Duplicate entry '10-goodbay' for key 'tab.parent_container_id'


INSERT INTO tab(connection_id, parent_container_id, name)
VALUES (NULL, NULL, 'a');
-- Check constraint 'tab_chk_1' is violated.

INSERT INTO tab(connection_id, parent_container_id, name)
VALUES (2, 2, 'a');
-- Check constraint 'tab_chk_1' is violated.

score 1 · Answer 3 · answered Jan 10 '22 at 03:08

1

Some things are best done in the application. Use it to do the enforcement. Or consider using Stored Procedure wrappers around the SQL code needed.

The one PRIMARY KEY for a table must include non-NULL column(s). But a UNIQUE secondary index may include NULL column(s).

Hence, neither of the two columns can be the PK. Nor can the combination of them.

answered Jan 10 '22 at 03:08

Rick James

135,179
13
127
222

1

A Relational db is a single recoverable unit, a shared corporate resource, open to all applications (not one). All rules pertaining to data should be Constraints in the db. There is nothing that cannot be defined in Codd's _RM_. – PerformanceDBA Jan 19 '22 at 11:44
@PerformanceDBA - Older versions of MySQL ignored `CHECK()`. – Rick James Jan 19 '22 at 16:56
1

Sure., and it still can't call a Function Posgres ignores BEGIN TRAN and cacks itself. The freeware & Oracle do not have ACID Transactions. The horrors of freeware. – PerformanceDBA Jan 19 '22 at 19:46

score 0 · Answer 4 · answered Jan 19 '22 at 02:12

The SQL92 standard defines:

A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns. In addition, if the unique constraint was defined with PRIMARY KEY, then it requires that none of the values in the specified column or columns be the null value.

For NULL, it can be understood as unknown. Although we don't know it now, it has many possibilities in the future, but we don't know it yet. Of course many people think it is a bug.

For the two fields parent_container and connection_id described above, why not from a business point of view, for NULL, set a value that is considered impossible in business instead, so as to avoid the limitation of MySQL unique key

Acceptable way to get around a UNIQUE w NULL constraint?

4 Answers4

Relational Solution

Data Model

Notation