2

Is there some "easy" way (e.g. something I am missing in Attoparsec or some other library) to convert a defined Attoparsec parser that parses from ByteString to the one that parses from Text?

For example I have:

import Data.Attoparsec.ByteString.Char8
myTypeByteStringParser :: Parser MyType

What's the way to transform it into:

import Data.Attoparsec.Text
myTypeTextParser :: Parser MyType

It does look like contramap (from hoogling type signature) but it is probably not possible to define Contravariant for Parser?

esp
  • 7,314
  • 6
  • 49
  • 79
  • 1
    Do you control the definition of that parser? Would it work to make it polymorphic, i.e. `myTypeParser :: forall i. Parser i MyType`? – Fyodor Soikin Jan 29 '21 at 21:02
  • But `ByteString` and `Text` are fundamentally different. One is bytes, the other is characters! What are you trying to do? – Daniel Wagner Jan 30 '21 at 00:34
  • attoparsec doesn't provide polymorphic components - they are tied to a particular type... It is probably not possible... – esp Jan 30 '21 at 19:15
  • re Text/Bytestring being different - I meant to make this Parser conversion based on encodeUtf8/decodeUtf8 conversion – esp Jan 30 '21 at 19:16
  • @esp Why not just apply `encodeUtf8`/`decodeUtf8` to the data instead of the parser, then? – Daniel Wagner Jan 31 '21 at 15:56
  • The original idea was to use a defined Bytestring parser as a component in a Text parser... So I couldn’t just apply it to the whole data - I thought to somehow make Text parser out of Bytestring parser and use it in another.I ended up just doing it differently. It would probably be possible if Parser functions were defined as a class methods, so that the parser stays generic - attoparsec is not done this way, and I don’t know if there is one that is. – esp Jan 31 '21 at 22:19
  • Megaparsec parsers are [parametrized on the stream type `s`](https://hackage.haskell.org/package/megaparsec-9.0.1/docs/Text-Megaparsec.html#t:ParsecT) – James Brock Jul 05 '21 at 06:20

2 Answers2

2

I'm not sure this is possible in general. The Parser type defined in Attoparsec doesn't look like it plays nicely with modifying the input type. So, if you want to combine a Text parser with a ByteString parser, I'm afraid you may be out of luck.

That said, if what you want is to be able to run a ByteString parser on some input Text, you might be able to get around that by first converting the Text input into a ByteString. For instance:

import Data.Text.Encoding
import Data.Attoparsec.ByteString.Char8

-- parse :: Parser a -> ByteString -> Result a 
-- this is given by Attoparsec

parseText :: Parser a -> Text -> Result a
parseText p = parse p . encodeUtf8

Similarly, you can turn a Text parser into a ByteString one by using decodeUtf8 (or a different encoder/decoder as necessary).

DDub
  • 3,884
  • 1
  • 5
  • 12
  • Yes - I thought about exactly that - I can apply it to parsing function but not to the parser itself... – esp Jan 30 '21 at 19:14
  • 1
    You can probably do something similar to the internals, but it will likely be terrible for performance, and it will definitely require you to fork `attoparsec` (you'll need access to the unexposed `Buffer` types). I'm not sure what your actual use case is here, but you probably want to find another way to solve your problem. – DDub Jan 30 '21 at 23:19
2

This is possible in general and you don't need to fork attoparsec. Inconsiderately attoparsec doesn't expose enough of its internals, but don't let that stop us:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE QuasiQuotes #-}

module Parsers where

import qualified Data.Attoparsec.ByteString as AB
import qualified Data.Attoparsec.Internal.Types as AIT
import qualified Data.Attoparsec.Text as AT
import Data.ByteString (ByteString)
import qualified Data.ByteString.Internal as BI
import Data.Text (Text)
import Data.Text.Encoding (decodeUtf8, encodeUtf8)
import qualified Data.Text.Internal as TI
import Unsafe.TrueName

bsToTextState :: AIT.State ByteString -> AIT.State Text
bsToTextState = bufferText . decodeUtf8 . unbufferBS where
    unbufferBS :: AIT.State ByteString -> ByteString
    unbufferBS [truename| ''AIT.State
        Data.Attoparsec.ByteString.Buffer.Buffer
        Buf | fp off len _ _ |] = BI.PS fp off len
    bufferText :: Text -> AIT.State Text
    bufferText (TI.Text arr off len) = [truename| ''AIT.State
        Data.Attoparsec.Text.Buffer.Buffer
        Buf |] arr off len len 0

textToBSState :: AIT.State Text -> AIT.State ByteString
textToBSState = bufferBS . encodeUtf8 . unbufferText where
    unbufferText :: AIT.State Text -> Text
    unbufferText [truename| ''AIT.State
        Data.Attoparsec.Text.Buffer.Buffer
        Buf | arr off len _ _ |] = TI.Text arr off len
    bufferBS :: ByteString -> AIT.State ByteString
    bufferBS (BI.PS fp off len) = [truename| ''AIT.State
        Data.Attoparsec.ByteString.Buffer.Buffer
        Buf |] fp off len len 0

mapIResult :: (i -> j) -> (j -> i) -> AIT.IResult i a -> AIT.IResult j a
mapIResult f g = go where
    go = \case
        AIT.Fail i ctx msg -> AIT.Fail (f i) ctx msg
        AIT.Partial k -> AIT.Partial (go . k . g)
        AIT.Done i r -> AIT.Done (f i) r

mapFailure :: (i -> j) -> (j -> i) -> (AIT.State j -> AIT.State i) ->
    AIT.Failure i (AIT.State i) r -> AIT.Failure j (AIT.State j) r
mapFailure f g h k st p m ctx msg = mapIResult f g $ k (h st) p m ctx msg

mapSuccess :: (i -> j) -> (j -> i) -> (AIT.State j -> AIT.State i) ->
    AIT.Success i (AIT.State i) a r -> AIT.Success j (AIT.State j) a r
mapSuccess f g h k st p m a = mapIResult f g $ k (h st) p m a

bsToTextParser :: AB.Parser a -> AT.Parser a
bsToTextParser (AIT.Parser bsP) = AIT.Parser textP where
    textP st p m f s = mapIResult decodeUtf8 encodeUtf8 $ bsP
        (textToBSState st) p m
        (mapFailure encodeUtf8 decodeUtf8 bsToTextState f)
        (mapSuccess encodeUtf8 decodeUtf8 bsToTextState s)

textToBSParser :: AT.Parser a -> AB.Parser a
textToBSParser (AIT.Parser textP) = AIT.Parser bsP where
    bsP st p m f s = mapIResult encodeUtf8 decodeUtf8 $ textP
        (bsToTextState st) p m
        (mapFailure decodeUtf8 encodeUtf8 textToBSState f)
        (mapSuccess decodeUtf8 encodeUtf8 textToBSState s)

{,un}buffer{BS,Text} are adapted from the respective internal modules Data.Attoparsec.{ByteString,Text}.Buffer.

Was a good excuse for me to update true-name to work with more recent GHC though. Depending on how up-to-date you are, you may need the WIP from GitHub.

It's probably not terrible for performance, as long as you keep in mind that each time you use textToBSParser, the entire input gets fed through encodeUtf8 with any leftover converted back via decodeUtf8, and vice versa for bsToTextParser. If you only convert a Parser once at the top-level, it shouldn't be too different from simply converting the input as the other answer suggests.

PS: I haven't tested this beyond

$ ghci -XOverloadedStrings parsers.hs 
*Parsers> textToBSParser AT.scientific `AB.parseTest` "123 "
Done " " 123.0

PPS: for your own parsers, you might be able to leverage OverloadedStrings and write p :: IsString s => AIT.Parser s a instead, with {-# SPECIALISE p :: AT.Parser a #-} pragmas. I've not explored how workable this idea is.

liyang
  • 570
  • 3
  • 6
  • Thank you. Got some urge to make a generic fork of auttoparsec that doesn't commit to the source type for longer :) – esp Feb 04 '21 at 13:17
  • They say you can't make something both Functor and Contravariant if you obey all laws (it's goes over my head tbh), but seems like you should be able to make polymorphic parsers... – esp Feb 04 '21 at 13:19