As pigworker remarked in the question comments, using the default Generic
representation leads to great ugliness, since we don't have prior information about recursion in our type, and we have to dig out recursive occurrences by manually checking for type equality. I'd like to present here alternative solutions with explicit f-algebra-style recursion. For this, we need an alternative generic Rep
. Sadly, this means we can't easily tap into GHC.Generics
, but I hope this will be edifying nonetheless.
In my first solution I aim for a presentation that is as simple as possible within current GHC capabilities. The second solution is a TypeApplication
-heavy GHC 8-based one with more sophisticated types.
Starting out as usual:
{-# language
TypeOperators, DataKinds, PolyKinds,
RankNTypes, EmptyCase, ScopedTypeVariables,
DeriveFunctor, StandaloneDeriving, GADTs,
TypeFamilies, FlexibleContexts, FlexibleInstances #-}
My generic representation is a fixpoint of a sum-of-products. It slightly extends the basic model of generics-sop
, which is also a sum-of-products but not functorial and therefore ill-equipped for recursive algorithms. I think SOP is overall a much better practical representation than arbitrarily nested types; you can find extended arguments as to why this is the case in the paper. In short, SOP removes unnecessary nesting information and lets us separate metadata from basic data.
But before anything else, we should decide on a code for generic types. In vanilla GHC.Generics
there isn't a well-defined kind of codes, as the type constructors of sums, products etc. form an ad-hoc type-level grammar, and we can dispatch on them using type classes. We adhere more closely to usual presentations in dependently typed generics, and use explicit codes, interpretations and functions. Our codes shall be of kind:
[[Maybe *]]
The outer list encodes a sum of constructors, with each inner [Maybe *]
encoding a constructor. A Just *
is just a constructor field, while Nothing
denotes a recursive field. For example, the code of [Int]
is ['[], [Just Int, Nothing]]
.
type Rep a = Fix (SOP (Code a))
class Generic a where
type Code a :: [[Maybe *]]
to :: a -> Rep a
from :: Rep a -> a
data NP (ts :: [Maybe *]) (k :: *) where
Nil :: NP '[] k
(:>) :: t -> NP ts k -> NP (Just t ': ts) k
Rec :: k -> NP ts k -> NP (Nothing ': ts) k
infixr 5 :>
data SOP (code :: [[Maybe *]]) (k :: *) where
Z :: NP ts k -> SOP (ts ': code) k
S :: SOP code k -> SOP (ts ': code) k
Note that NP
has different constructors for recursive and non-recursive fields. This is quite important, because we want codes to be unambiguously reflected in the type indices. In other words, we would like NP
to also act as a singleton for [Maybe *]
(although we remain parametric in *
for good reasons).
We use a k
parameter in the definitions to leave a hole for recursion. We set up recursion as usual, leaving the Functor
instances to GHC:
deriving instance Functor (SOP code)
deriving instance Functor (NP code)
newtype Fix f = In {out :: f (Fix f)}
cata :: Functor f => (f a -> a) -> Fix f -> a
cata phi = go where go = phi . fmap go . out
We have two type families:
type family CurryNP (ts :: [Maybe *]) (r :: *) :: * where
CurryNP '[] r = r
CurryNP (Just t ': ts) r = t -> CurryNP ts r
CurryNP (Nothing ': ts) r = r -> CurryNP ts r
type family Alg (code :: [[Maybe *]]) (r :: *) :: * where
Alg '[] r = ()
Alg (ts ': tss) r = (CurryNP ts r, Alg tss r)
CurryNP ts r
curries NP ts
with result type r
, and it also plugs in r
in the recursive occurrences.
Alg code r
computes the type of an algebra on SOP code r
. It tuples together the eliminators for the individual constructors. Here we use plain nested tuples, but of course HList
-s would be adequate too. We could also reuse NP
here as a HList
, but I find that too kludgy.
All that's left is to implement the functions:
uncurryNP :: CurryNP ts a -> NP ts a -> a
uncurryNP f Nil = f
uncurryNP f (x :> xs) = uncurryNP (f x) xs
uncurryNP f (Rec k xs) = uncurryNP (f k) xs
algSOP :: Alg code a -> SOP code a -> a
algSOP fs (Z np) = uncurryNP (fst fs) np
algSOP fs (S sop) = algSOP (snd fs) sop
gcata :: Generic a => Alg (Code a) r -> a -> r
gcata f = cata (algSOP f) . to
The key point here is that we have to convert the curried eliminators in Alg
into a "proper" SOP code a -> a
algebra, since that is the form that can be directly used in cata
.
Let's define some sugar and instances:
(<:) :: a -> b -> (a, b)
(<:) = (,)
infixr 5 <:
instance Generic (Fix (SOP code)) where
type Code (Fix (SOP code)) = code
to = id
from = id
instance Generic [a] where
type Code [a] = ['[], [Just a, Nothing]]
to = foldr (\x xs -> In (S (Z (x :> Rec xs Nil)))) (In (Z Nil))
from = gcata ([] <: (:) <: ()) -- note the use of "Generic (Rep [a])"
Example:
> gcata (0 <: (+) <: ()) [0..10]
55
Full code.
However, it would be nicer if we had currying and didn't have to use HList
-s or tuples to store eliminators. The most convenient way is to have the same order of arguments as in standard library folds, such as foldr
or maybe
. In this case the return type of gcata
is given by a type family that computes from the generic code of a type.
type family CurryNP (ts :: [Maybe *]) (r :: *) :: * where
CurryNP '[] r = r
CurryNP (Just t ': ts) r = t -> CurryNP ts r
CurryNP (Nothing ': ts) r = r -> CurryNP ts r
type family Fold' code a r where
Fold' '[] a r = r
Fold' (ts ': tss) a r = CurryNP ts a -> Fold' tss a r
type Fold a r = Fold' (Code a) r (a -> r)
gcata :: forall a r. Generic a => Fold a r
This gcata
is highly (fully) ambiguous. We need either explicit application or Proxy
, and I opted for the former, incurring a GHC 8 dependence. However, once we supply an a
type, the result type reduces, and we can easily curry:
> :t gcata @[_]
gcata @[_] :: Generic [t] => r -> (t -> r -> r) -> [t] -> r
> :t gcata @[_] 0
gcata @[_] 0 :: Num t1 => (t -> t1 -> t1) -> [t] -> t1
> gcata @[_] 0 (+) [0..10]
55
I used above a partial type signature in [_]
. We can also create a shorthand for this:
gcata1 :: forall f a r. Generic (f a) => Fold (f a) r
gcata1 = gcata @(f a) @r
Which can be used as gcata1 @[]
.
I'd rather not elaborate the implementation of the above gcata
here. It's not much longer than the simple version, but the gcata
implementation is pretty hairy (embarrassingly, it's responsible for my delayed answer). Right now I couldn't explain it very well, since I wrote it with Agda aid, which entails plenty of automatic search and type tetris.