1

I would like to know what can be considered as a best practice regarding the State monad. I'm also open to any other suggestion.

I have a binary file to parse. It contains different header that need to be parsed in order to be able to read the complete file.

So the headers can be parsed using only State from the parse.

data ParseState = ParseState {
   offset :: Int64
   buffer :: B.ByteString
   endianness :: Endianness
   pointerSize :: MachineWord
   positionStack :: [Int64]
}

This data is then used in a State monad

type Parser a = State ParseState a

This can perfectly suite the parsing of the header. But as soon as I want to parse the complete file I need information from the header to be able to correctly read the file.

data Header = Header {
    txtOffset :: Int64,
    stringOffset :: Int64
}

I need the header information to continue parsing the file.

My idea was to use a new state monad that sit on top of the previous one. So I have a new StateT monad:

type ParserFullState a = StateT Header (State ParserState) a

Thus I can continue and build a whole set of parser function using the new state transformer. I could also do it differently and add the header to the original ParseState data.

The pros I can see at adding the header back into the ParserState are the following:

  1. The return type of parser function is uniform
  2. No need to call lift to access the parser primitive.

The cons I can see are:

  1. There is no distinction between higher level parser and lower primitive.
  2. We can not tell clearly when the header is fully parse or when it is not. Thus making the parser modification more fragile.

What is your suggestion? Should I use the state transformer of should I add the header to the original state or anythings else?

Thanks.

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
mathk
  • 7,973
  • 6
  • 45
  • 74
  • Why not `data ParseState' = Initial ParseState | Later ParseState Header` or some other ADT that fits your problem so that you only have one state that can represent your initial parse state, then later when you need the `Header` you have both the `ParseState` and the `Header` information and they're kept separate? You just have to `put (Later parseState header)` once you have the header information. – bheklilr Aug 14 '14 at 13:40

1 Answers1

7

Generally, I would advice against using multiple layers of State (or indeed any transformer). Transformers are great, but in thicker clusters they do get confusing, especially when the type system can't properly decide which MonadState to use anymore.

Nevertheless, in you specific case another transformer is actually a good idea, but not a StateT: the header information shouldn't change during the further parsing of the file, so it should really just be a ReaderT, shouldn't it?

type ParserFullState = ReaderT Header (State ParserState)

or equivalently

type ParserFullState = RSS Header () ParserState
leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • 1
    Note that transformers are only confusing if you use the `mtl` library. If you use the `transformers` library with explicit `lift`s it is straightforward. – Gabriella Gonzalez Aug 14 '14 at 17:05
  • 2
    @GabrielGonzalez: Not confusing to the type system, but it can still get pretty confusing to the *programmer*! – Tikhon Jelvis Aug 14 '14 at 22:46