First, the types you mention are not really GADTs, they are plain ADTs, since the return type of each constructor is always T a
. A proper GADT would be something like
data T a where
K1 :: T Bool -- not T a
Anyway, the technique you mention is a well known method to encode algebraic data types into (polymorphic) functions. It goes under many names, like Church encoding, Boehm-Berarducci encoding, endcoding as a catamorphism, etc. Sometimes the Yoneda lemma is used to justify this approach, but there's no need to understand the category-theoretic machinery to understand the method.
Basically, the idea is the following. All the ADTs can be generated by
- product types
(,) a b
- sum types
Either a b
- arrow types
a -> b
- unit type
()
- void type
Void
(rarely used in Haskell, but theoretically nice)
- variables (if the type bing defined has parameters)
- possibly, basic types (
Integer
, ...)
- type level-recursion
Type level recursion is used when some value constructor takes the type which is being defined as an argument. The classic example is Peano-style naturals:
data Nat where
O :: Nat
S :: Nat -> Nat
-- ^^^ recursive!
Lists are also common:
data List a where
Nil :: List a
Cons :: a -> List a -> List a
-- ^^^^^^ recursive!
Types like Maybe a
, pairs, etc. are non recursive.
Note that each ADT, recursive or not, can be reduced to a single constructor with a sigle argument, by summing (Either
) over all the constructors, and multiplying all the arguments. For instance, Nat
is isomorphic to
data Nat1 where
O1 :: () -> Nat
S1 :: Nat -> Nat
which is isomorphic to
data Nat2 where K2 :: (Either () Nat) -> Nat
Lists become
data List1 a where K1 :: (Either () (a, List a)) -> List a
The steps above make use of the algebra of types, which makes the sum and products of types obey the same rules as high school algebra, while a -> b
behaves like the exponential b^a
.
Hence, we can write any ADT in the form
-- pseudo code
data T where
K :: F T -> T
type F k = .....
For instance
type F_Nat k = Either () k -- for T = Nat
type F_List_a k = Either () (a, k) -- for T = List a
(Note that the latter type function F
depends on a
, but it's not important right now.)
Non recursive types will not use k
:
type F_Maybe_a k = Either () a -- for T = Maybe a
Note that constructor K
above makes the type T
isomorphic to F T
(let's ignore the lifting / extra bottom introduced by it). Essentially, we have that
Nat ~= F Nat = Either () Nat
List a ~= F (List a) = Either () (a, List a)
Maybe a ~= F (Maybe a) = Either () a
We can even formalize this further by abstracting from F
newtype Fix f = Fix { unFix :: f (Fix f) }
By definition Fix F
will now be isomorphic to F (Fix F)
. We could let
type Nat = Fix F_Nat
(In Haskell, we need a newtype wrapper around F_Nat
, which I omit for clarity.)
Finally, the general encoding, or catamorphism, is:
cata :: (F k -> k) -> Fix F -> k
This assumes that F
is a functor.
For Nat
, we get
cata :: (Either () k -> k) -> Nat -> k
-- isomorphic to
cata :: (() -> k, k -> k) -> Nat -> k
-- isomorphic to
cata :: (k, k -> k) -> Nat -> k
-- isomorphic to
cata :: k -> (k -> k) -> Nat -> k
Note the "high school albegra" steps, where k^(1+k) = k^1 * k^k
, hence Either () k -> k ~= (() -> k, k -> k)
.
Note that we get two arguments, k
and k->k
which correspond to O
and S
. This is not a coincidence -- we summed over all the constructors. This means that cata
expects to be passed the value of type k
which "plays the role of O
" there, and then the value of type k -> k
which plays the role of S
.
More informally, cata
is telling us that, if we want to map a natural in k
, we only have to state what is the "zero" inside k
and how to take the "successor" in k
, and then every Nat
can be mapped consequently.
For lists we get:
cata :: (Either () (a, k) -> k) -> List a -> k
-- isomorphic to
cata :: (() -> k, (a, k) -> k) -> List a -> k
-- isomorphic to
cata :: (k, a -> k -> k) -> List a -> k
-- isomorphic to
cata :: k -> (a -> k -> k) -> List a -> k
which is foldr
.
Again, this is cata
telling us that, if we state how to take the "empty list" in k
and to "cons" a
and k
inside k
, we can map any list in k
.
Maybe a
is the same:
cata :: (Either () a -> k) -> Maybe a -> k
-- isomorphic to
cata :: (() -> k, a -> k) -> Maybe a -> k
-- isomorphic to
cata :: (k, a -> k) -> Maybe a -> k
-- isomorphic to
cata :: k -> (a -> k) -> Maybe a -> k
If we can map Nothing
in k
, and perform Just
mapping a
in k
, the we can map any Maybe a
in k
.
If we try to apply the same approach to Bool
and (a,b)
we reach the functions which were posted in the questions.
More advanced theoretical topics to look up:
- (initial) F-algebras in category theory
- eliminators / recursors / induction principles in type theory (these can be applied to GADTs as well)