Why are type-safe relational operations so difficult?

Question

I was trying to code a relational problem in Haskell, when I had to find out that doing this in a type safe manner is far from obvious. E.g. a humble

select 1,a,b, from T

already raises a number of questions:

what is the type of this function?
what is the type of the projection 1,a,b ? What is the type of a projection in general?
what is the result type and how do I express the relationship between the result type and the projection?
what is the type of such a function which accepts any valid projection?
how can I detect invalid projections at compile time ?
How would I add a column to a table or to a projection?

I believe even Oracle's PL/SQL language does not get this quite right. While invald projections are mostly detected at compile time, the is a large number of type errors which only show at runtime. Most other bindings to RDBMSs (e.g. Java's jdbc and perl's DBI) use SQL contained in Strings and thus give up type-safety entirely.

Further research showed that there are some Haskell libraries (HList, vinyl and TRex), which provide type-safe extensible records and some more. But these libraries all require Haskell extensions like DataKinds, FlexibleContexts and many more. Furthermore these libraries are not easy to use and have a smell of trickery, at least to uninitialized observers like me.

This suggests, that type-safe relational operations do not fit in well with the functional paradigm, at least not as it is implemented in Haskell.

My questions are the following:

What are the fundamental causes of this difficulty to model relational operations in a type safe way. Where does Hindley-Milner fall short? Or does the problem originate at typed lambda calculus already?
Is there a paradigm, where relational operations are first class citizens? And if so, is there a real-world implementation?

Have you looked at the libraries which allow type safe DB access in Haskell? E.g. https://hackage.haskell.org/package/esqueleto or http://www.yesodweb.com/book/persistent ? — chi, May 02 '15 at 08:49
Not yet. Database access is not my primary concern. But those packages must have solved my problems one way or another, so I might take a look. Thanks. — Martin Drautzburg, May 02 '15 at 09:20

Gabriella Gonzalez · Answer 1 · 2015-05-03T15:45:36.350

Let's define a table indexed on some columns as a type with two type parameters:

data IndexedTable k v = ???

groupBy :: (v -> k) -> IndexedTable k v

-- A table without an index just has an empty key
type Table = IndexedTable ()

k will be a (possibly nested) tuple of all columns that the table is indexed on. v will be a (possibly nested) tuple of all columns that the table is not indexed on.

So, for example, if we had the following table

| Id | First Name | Last Name |
|----|------------|-----------|
|  0 | Gabriel    | Gonzalez  |
|  1 | Oscar      | Boykin    |
|  2 | Edgar      | Codd      |

... and it were indexed on the first column, then the type would be:

type Id = Int
type FirstName = String
type LastName = String

IndexedTable Int (FirstName, LastName)

However, if it were indexed on the first and second column, then the type would be:

IndexedTable (Int, Firstname) LastName

Table would implement the Functor, Applicative, and Alternative type classes. In other words:

instance Functor (IndexedTable k)

instance Applicative (IndexedTable k)

instance Alternative (IndexedTable k)

So joins would be implemented as:

join :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, v2)
join t1 t2 = liftA2 (,) t1 t2

leftJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, Maybe v2)
leftJoin t1 t2 = liftA2 (,) t1 (optional t2)

rightJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (Maybe v1, v2)
rightJoin t1 t2 = liftA2 (,) (optional t1) t2

Then you would have a separate type that we will call a Select. This type will also have two type parameters:

data Select v r = ???

A Select would consume a bunch of rows of type v from the table and produce a result of type r. In other words, we should have a function of type:

selectIndexed :: Indexed k v -> Select v r -> r

Some example Selects that we might define would be:

count   :: Select v Integer
sum     :: Num a => Select a a
product :: Num a => Select a a
max     :: Ord a => Select a a

This Select type would implement the Applicative interface, so we could combine multiple Selects into a single Select. For example:

liftA2 (,) count sum :: Select Integer (Integer, Integer)

That would be analogous to this SQL:

SELECT COUNT(*), SUM(*)

However, often our table will have multiple columns, so we need a way to focus a Select onto a single column. Let's call this function Focus:

focus :: Lens' a b -> Select b r -> Select a r

So that we can write things like:

liftA3 (,,) (focus _1 sum) (focus _2 product) (focus _3 max)
  :: (Num a, Num b, Ord c)
  => Select (a, b, c) (a, b, c)

So if we wanted to write something like:

SELECT COUNT(*), MAX(firstName) FROM t

That would be equivalent to this Haskell code:

firstName :: Lens' Row String

table :: Table Row

select table (liftA2 (,) count (focus firstName max)) :: (Integer, String)

So you might wonder how one might implement Select and Table.

I describe how to implement Table in this post:

http://www.haskellforall.com/2014/12/a-very-general-api-for-relational-joins.html

... and you can implement Select as just:

type Select = Control.Foldl.Fold

type focus = Control.Foldl.pretraverse

-- Assuming you define a `Foldable` instance for `IndexedTable`
select t s = Control.Foldl.fold s t

Also, keep in mind that these are not the only ways to implement Table and Select. They are just a simple implementation to get you started and you can generalize them as necessary.

What about selecting columns from a table? Well, you can define:

column :: Select a (Table a)
column = Control.Foldl.list

So if you wanted to do:

SELECT col FROM t

... you would write:

field :: Lens' Row Field

table :: Table Row

select (focus field column) table :: [Field]

The important takeaway is that you can implement a relational API in Haskell just fine without any fancy type system extensions.

I find two things confusing: (1) in your reasoning the _keys_ play an awfully prominent role. I believe keys are required for certain guarantees, but not for the relational operations as such. (2) Shouldn't the result type of _select_ be a Table again? — Martin Drautzburg, May 03 '15 at 06:53
I updated my post. (1) For a table where you don't care about the key just set the type of the key to `()`. (2) You can get back a table just by applying an appropriate select that returns a table. A `Select` can return a list/table of results. — Gabriella Gonzalez, May 03 '15 at 15:46

Why are type-safe relational operations so difficult?

1 Answers1