I find it very common to want to model relational data in my functional programs. For example, when developing a web-site I may want to have the following data structure to store info about my users:
data User = User
{ name :: String
, birthDate :: Date
}
Next, I want to store data about the messages users post on my site:
data Message = Message
{ user :: User
, timestamp :: Date
, content :: String
}
There are multiple problems associated with this data structure:
- We don't have any way of distinguishing users with similar names and birth dates.
- The user data will be duplicated on serialisation/deserialisation
- Comparing the users requires comparing their data which may be a costly operation.
- Updates to the fields of
User
are fragile -- you can forget to update all the occurences ofUser
in your data structure.
These problems are manageble while our data can be represented as a tree. For example, you can refactor like this:
data User = User
{ name :: String
, birthDate :: Date
, messages :: [(String, Date)] -- you get the idea
}
However, it is possible to have your data shaped as a DAG (imagine any many-to-many relation), or even as a general graph (OK, maybe not). In this case, I tend to simulate the relational database by storing my data in Map
s:
newtype Id a = Id Integer
type Table a = Map (Id a) a
This kind of works, but is unsafe and ugly for multiple reasons:
- You are just an
Id
constructor call away from nonsensical lookups. - On lookup you get
Maybe a
, but often the database structurally ensures that there is a value. - It is clumsy.
- It is hard to ensure referential integrity of your data.
- Managing indices (which are very much necessary for performance) and ensuring their integrity is even harder and clumsier.
Is there existing work on overcoming these problems?
It looks like Template Haskell could solve them (as it usually does), but I would like not to reinvent the wheel.