Parsing and the use of GADTs

Question

I've ran into a problem while writing a parser. Specifically, I want to be return values of different types. For example, I have two different data types FA and PA to represent two different lipid classes -

data FA = ClassLevelFA IntegerMass
        | FA           CarbonChain
        deriving (Show, Eq, Ord)

data PA   = ClassLevelPA       IntegerMass
          | CombinedRadylsPA   TwoCombinedRadyls
          | UnknownSnPA        Radyl Radyl
          | KnownSnPA          Radyl Radyl
          deriving (Show, Eq, Ord)

Using attoparsec, I have built parsers to parse lipid shorthand notation. For the data types above, I have the parsers faParser and paParser. I would like to be able to parse for either an FA or PA. However, since FA and PA are different data types, I cannot do the following -

inputParser =  faParser
           <|> paParser

I have recently learnt about GADTs and I thought this would solve my problem. Consequently I went and made a GADT and an eval function and changed the parsers faParser and paParser. -

data ParsedLipid a where
  ParsedFA :: FA -> ParsedLipid FA
  ParsedPA :: PA -> ParsedLipid PA

eval :: ParsedLipid a -> a
eval (ParsedFA val) = val
eval (ParsedPA val) = val

This gets me close but it appears as if ParsedFA and ParsedPA are different data types? For example, parsing "PA 17:1_18:1" gives me a value of type ParsedLipid PA (not just ParsedLipid as I was expecting). Therefore, the parser inputParser still does not type check.

let lipid = use "PA 17:1_18:1"
*Main> :t lipid
lipid :: ParsedLipid PA

Any advice on how to get around this problem?

score 7 · Answer 1 · edited Mar 20 '17 at 10:29

@MathematicalOrchid points out that you probably don't need GADTs and a plain sum type might be enough. You may have an XY problem but I don't know enough about your use case to make a firm judgment. This answer is about how to turn untyped data into a GADT.

As you note, your parsing function can't return a ParsedLipid a because that leaves the calling context free to choose a which doesn't make sense; a is in fact determined by the input data. And you can't return a ParsedLipid FA or a ParsedLipid PA, because the input data may be of either type.

Therefore, the standard trick when building a GADT from runtime data - when you don't know the type of the index in advance - is to use existential quantification.

data AParsedLipid = forall a. AParsedLipid (ParsedLipid a)

The type parameter a appears on the right-hand side of AParsedLipid but not on the left. A value of AParsedLipid is guaranteed to contain a well-formed ParsedLipid, but its precise type is kept secret. An existential type is a wrapper which helps us to translate from the messy, untyped real world to a clean, strongly-typed GADT.

It's a good idea to keep the existentially quantified wrappers pushed to the edges of your system, where you have to communicate with the outside world. You can write functions with signatures like ParsedLipid a -> a in your core model and apply them to data under an existential wrapper at the edges. You validate your input, wrap it in an existential type, and then process it safely using your strongly-typed model - which doesn't have to worry about errors, because you already checked your input.

You can unpack an AParsedLipid to get your ParsedLipid back, and pattern-match on that to determine at runtime what a was - it'll either be FA or PA.

showFA :: FA -> String
showFA = ...
showPA :: PA -> String
showPA = ...

showLipid :: AParsedLipid -> String
showLipid (AParsedLipid (ParsedFA x)) = "AParsedLipid (ParsedFA "++ showFA x ++")"
showLipid (AParsedLipid (ParsedPA x)) = "AParsedLipid (ParsedPA "++ showPA x ++")"

You'll note that a can't appear in the return type of a function taking an AParsedLipid either, for the reasons outlined above. The return type must be known to the compiler; this technique does not allow you to define a "function with different return types".

When you're constructing an AParsedLipid, you have to generate enough evidence to convince the compiler that the wrapped ParsedLipid is well-formed. In your example, this comes down to parsing a well-typed PA or FA and then wrapping it up.

parser :: Parser AParsedLipid
parser = AParsedLipid <$> (fmap ParsedFA faParser <|> fmap ParsedPA paParser)

GADTs are a bit awkward when used with runtime data. The existential wrapper effectively erases the extra compile-time information in a ParsedLipid - AParsedLipid is isomorphic to Either FA PA. (Proving that isomorphism in code is a good exercise.) In my experience GADTs are much better at structuring programs than at structuring data - they excel at implementing strongly-typed embedded domain-specific languages for which the types' indices can be known at compile time. For example, Yampa and extensible-effects both use a GADT as their central data type. This helps the compiler to check that you're using the domain-specific language correctly in the code you write (and in some cases allows certain optimisations). It's pretty unlikely that you'll be building FRP networks or monadic effects at runtime from real-world data.

If you build a GADT and then put it in an existential wrapper... doesn't that achieve exactly the same end result is a normal ADT wrapper? — MathematicalOrchid, Oct 29 '15 at 10:20
Yes, it does. (That's the point I was trying to make in the last paragraph.) On the other hand, if the OP has a bunch of strongly-typed functions over `ParsedLipid`, sooner or later he's going to have to turn the `Either FA PA` into an `AParsedLipid` and apply his functions under the existential wrapper. — Benjamin Hodgson, Oct 29 '15 at 10:20
In this instance, that's probably not the case, but you can conceive of times where it is - you might have some key invariants of your model which you're trying to enforce using dependent types (and pushing error handling to the edges of your system). For example, a `zip` function which only works over `Vec`tors of equal length. — Benjamin Hodgson, Oct 29 '15 at 10:28
The key place where GADTs seem to be useful if where you have something like a complex AST that is expected to obey certain structural invariants. However, the OP seems to just want to be able to pretend that multiple separate types are "the same", which GADTs don't really help with. That's why I didn't mention it. ;-) But no harm in having multiple suggestions... — MathematicalOrchid, Oct 29 '15 at 10:32
I agree. The OP seems to have an XY problem which your answer addresses better than mine. I'm just answering the direct question, which was about parsing GADTs ;) — Benjamin Hodgson, Oct 29 '15 at 10:35
Thanks for the responses. I guess it's back to the drawing board for me. I probably should go through a "build an interpreter" style exercise to get a better idea on how things should be done. — Michael T, Oct 29 '15 at 10:53

score 4 · Answer 2 · answered Oct 29 '15 at 09:04

4

What exactly do you want?

If you know at compile time whether you want an FA or a PA, then GADTs are a good way to do that.

If you want to decide at run-time to parse either an FA or a PA, you could use... Either FA PA.

answered Oct 29 '15 at 09:04

MathematicalOrchid

61,854
19
123
220

FA and PA are just two of many data types I want to parse and therefore I cannot use Either FA PA. I want to be able to write different parsers for all my data types (that represent different lipids) and then combine them into a single parser. – Michael T Oct 29 '15 at 09:17
2

Well, you *can*: it becomes `Either X1 (Either X2 (Either X3 ...`, which isn't fun. I suggest you define an ADT that covers all the cases you want to parse. – MathematicalOrchid Oct 29 '15 at 09:19
If I use a regular ADT, won't I run into trouble defining a function with different return types? For example, say I define a data type Test, `> data Test = Test1 Int | Test2 Float`, and a function evalTest `> let evalTest l = case l of {Test1 val -> val; Test2 val -> val}`, I get an error `Couldn't match expected type ‘Int’ with actual type ‘Float’`. – Michael T Oct 29 '15 at 09:35
1

There is no way of writing a function that returns different types at random. The return type must be 100% determined at compile-time. Perhaps you should edit the question (or maybe ask a new one) with more details on exactly what your end goal is. – MathematicalOrchid Oct 29 '15 at 09:45
4

Conceptually, going from a normal datatype to a GADT corresponds to typechecking. Doing typechecking during parsing is in general very difficult. As @MathematicalOrchid suggests, using a normal datatype for the parse result is usually adequate. You'd then possibly run an additional check on the parse result that, if successful, produces a value of the GADT as a witness. – kosmikus Oct 29 '15 at 11:25

Parsing and the use of GADTs

2 Answers2

Linked