Part 1: Encoding Algrebraic Data Types in Relational Tables
I've struggled with this very thing many times. I finally discovered the key to modeling algebraic data types in relational tables: Check constraints.
With a check constraint, you can use a common table for all members of your polymorphic type yet still enforce the invariant of each member.
Consider the following SQL schema:
CREATE TABLE ConcreteType (
Id TINYINT NOT NULL PRIMARY KEY,
Type VARCHAR(10) NOT NULL
)
INSERT ConcreteType
VALUES
(1,'Concrete1'),
(2,'Concrete2')
CREATE TABLE Base (
Id INT NOT NULL PRIMARY KEY,
Name VARCHAR(100) NOT NULL,
ConcreteTypeId TINYINT NOT NULL,
BaseReferenceId INT NULL)
GO
ALTER TABLE Base
ADD CONSTRAINT FK_Base_ConcreteType
FOREIGN KEY(ConcreteTypeId)
REFERENCES ConcreteType(Id)
ALTER TABLE Base
ADD CONSTRAINT FK_Base_BaseReference
FOREIGN KEY(BaseReferenceId)
REFERENCES Base(Id)
Simple, right?
We've addressed concern #1 of having meaningless data in the table representing the abstract base class by eliminating that table. We've also combined the tables that were used to model each concrete type independently, opting instead to store all Base
instances--regardless of their concrete type--in the same table.
As-is, this schema does not constrain the polymorphism of your Base
type. As-is, it is possible to insert rows of ConcreteType1
with a non-null BaseReferenceId
or rows of ConcereteType2
with a null BaseReferenceId
.
There is nothing keeping you from inserting invalid data, so you'd need to be very diligent about your inserts and edits.
This is where the check constraint really shines.
ALTER TABLE Base
ADD CONSTRAINT Base_Enforce_SumType_Properties
CHECK
(
(ConcreteTypeId = 1 AND BaseReferenceId IS NULL)
OR
(ConcreteTypeId = 2 AND BaseReferenceId IS NOT NULL)
)
The check constraint Base_Enforce_SumType_Properties
defines the invariants for each concrete type, protecting your data on insert and update. Go ahead and run all the DDL to create the ConcreteType
and Base
tables in your own database. Then try to insert rows into Base
that break the rules described in the check constraint. You can't! Finally, your data model holds together.
To address concern #2: Now that all members of your type are in a single table (with invariants enforced), your queries will be simpler. You don't even need "equivalent to the match
F# keyword in SQL". Adding a new concrete type is as simple as inserting a new row into the ConcreteType
table, adding any new properties as columns in the Base
table, and modifying the constraint to reflect any new invariants.
Part 2: Encoding hierarchical (read: recursive) relationships in SQL Server
Part of concern #2 I think about the complexity of querying across the 'parent-child' relationship that exists between ConcreteType2
and Base
. There are many ways to approach this kind of query and to pick one, we'd need a particular use case in mind.
Example use case: We wish to query every single Base
instance and assemble an object graph incorporating every row. This is easy; we don't even need a join. We just need a mutable Dictionary<int,Base>
with Id
used as the key.
It would be a lot to go into here but its something to consider: There is a MSSQL datatype named HierarchyID
(docs) that implements the 'materialized path' pattern, allowing easier modeling of hierarchies like yours. You could try using HierarchyID
instead of INT
on your Base.ID
/Base.BaseReferenceID
columns.
I hope this helps.