4

I'm modeling an application in F# and I encountered a difficulty when trying to construct the database tables for the following recursive type :

type Base = 
  | Concrete1 of Concrete1
  | Concrete2 of Concrete2
and Concrete1 = {
  Id : string
  Name : string }
and Concrete2 = {
  Id : string
  Name : string
  BaseReference : Base }

The solution I've got for the moment (I've found inspiration here http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) is :

enter image description here

I have two concerns with this solution :

  1. There will be rows on the Base table even though that doesn't make sense in my model. But I can live with that.

  2. It seems that queries to find all the information about BaseReference of Concrete2 will be complex since I will have to take into account the recursivity of the type and the different concrete tables. Moreover, adding a new concrete type to the model must modify these queries. Unless of course there is an equivalent to the match F# keyword in SQL.

Am I worrying too much about these concerns? or maybe, is there a better way to model this recursive F# type in SQL tables?

Mario
  • 313
  • 2
  • 11
  • 1
    It seems to me that your type Base is roughly equivalent to a non-empty list of (Id, Name) tuples? Is that so, or did you trim the example in order to ask it? – Robert Nielsen Oct 12 '17 at 16:29
  • 1
    @RobertNielsen : At first I trim the example in order to ask, but now I think I will leave like that (a list of (Id, Type)) and I will create another table to contain the common information for the concrete types, a BasedInformation table that every concrete table will reference with a foreign key. I do that because otherwise the Base table will contain the same information between rows. – Mario Oct 13 '17 at 08:22

1 Answers1

0

Part 1: Encoding Algrebraic Data Types in Relational Tables

I've struggled with this very thing many times. I finally discovered the key to modeling algebraic data types in relational tables: Check constraints.

With a check constraint, you can use a common table for all members of your polymorphic type yet still enforce the invariant of each member.

Consider the following SQL schema:

CREATE TABLE ConcreteType (
    Id TINYINT NOT NULL PRIMARY KEY,
    Type VARCHAR(10) NOT NULL
)
INSERT ConcreteType
VALUES 
(1,'Concrete1'),
(2,'Concrete2')

CREATE TABLE Base (
    Id INT NOT NULL PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    ConcreteTypeId TINYINT NOT NULL,
    BaseReferenceId INT NULL)

GO

ALTER TABLE Base
ADD CONSTRAINT FK_Base_ConcreteType
FOREIGN KEY(ConcreteTypeId)
REFERENCES ConcreteType(Id)

ALTER TABLE Base
ADD CONSTRAINT FK_Base_BaseReference
FOREIGN KEY(BaseReferenceId)
REFERENCES Base(Id)

Simple, right?

We've addressed concern #1 of having meaningless data in the table representing the abstract base class by eliminating that table. We've also combined the tables that were used to model each concrete type independently, opting instead to store all Base instances--regardless of their concrete type--in the same table.

As-is, this schema does not constrain the polymorphism of your Base type. As-is, it is possible to insert rows of ConcreteType1 with a non-null BaseReferenceId or rows of ConcereteType2 with a null BaseReferenceId. There is nothing keeping you from inserting invalid data, so you'd need to be very diligent about your inserts and edits.

This is where the check constraint really shines.

ALTER TABLE Base
ADD CONSTRAINT Base_Enforce_SumType_Properties
CHECK
(
    (ConcreteTypeId = 1 AND BaseReferenceId IS NULL)
    OR
    (ConcreteTypeId = 2 AND BaseReferenceId IS NOT NULL)
)

The check constraint Base_Enforce_SumType_Properties defines the invariants for each concrete type, protecting your data on insert and update. Go ahead and run all the DDL to create the ConcreteType and Base tables in your own database. Then try to insert rows into Base that break the rules described in the check constraint. You can't! Finally, your data model holds together.

To address concern #2: Now that all members of your type are in a single table (with invariants enforced), your queries will be simpler. You don't even need "equivalent to the match F# keyword in SQL". Adding a new concrete type is as simple as inserting a new row into the ConcreteType table, adding any new properties as columns in the Base table, and modifying the constraint to reflect any new invariants.

Part 2: Encoding hierarchical (read: recursive) relationships in SQL Server

Part of concern #2 I think about the complexity of querying across the 'parent-child' relationship that exists between ConcreteType2 and Base. There are many ways to approach this kind of query and to pick one, we'd need a particular use case in mind.

Example use case: We wish to query every single Base instance and assemble an object graph incorporating every row. This is easy; we don't even need a join. We just need a mutable Dictionary<int,Base> with Id used as the key.

It would be a lot to go into here but its something to consider: There is a MSSQL datatype named HierarchyID (docs) that implements the 'materialized path' pattern, allowing easier modeling of hierarchies like yours. You could try using HierarchyID instead of INT on your Base.ID/Base.BaseReferenceID columns.

I hope this helps.

Jonathan Wilson
  • 4,138
  • 1
  • 24
  • 36