Efficient "Parser a -> ByteString -> [a]" function

Question

What is the most efficient way to parse a large text content (300K+) for all matches of already created Attoparsec parser?

I have written a slow performant code like that:

import Data.Either (rights)

findAll :: Parser a -> String -> [a]
findAll parser = rights . map (parseOnly parser . pack) . oneLess where
                        oneLess []           = []
                        oneLess (whole@(_:xs)) = whole : oneLess xs

It is for String, but I think the best will be with ByteStrings.

Parsing "abba" in "abbabba" should return only one match ["abba"], i.e. after it match then to continue after it.

Yes, `ByteString` or `Text` is almost always a better option than `String`. But it would be useful to know why your code is slow ? Is the memory getting filled ? Also, if you use the function `parseOnly` from the module [Data.Attoparsec.ByteString](https://hackage.haskell.org/package/attoparsec-0.12.1.1/docs/Data-Attoparsec-ByteString.html), your function will become `findAll :: Parser a -> ByteString -> [a]` with little modifications. Use Pipes or Conduit, if you want to execute it under constant memory. — Sibi, Aug 20 '14 at 18:35
To clarify, if you have a parser that parses `"abba"` and an input string `"ababbabba"` you'd like `findAll` to return `["abba", "abba"]`? — cdk, Aug 20 '14 at 19:19
@cdk, ideally it should return only ["abba"], i.e. when match a pattern to continue after this whole match. — The_Ghost, Aug 23 '14 at 10:56
If I have parser that parses "abba" and an input string "ababbabba" it should return ["abba"]. If input string is "ababbaabba" it should return ["abba", "abba"]. — The_Ghost, Aug 25 '14 at 22:08

Efficient "Parser a -> ByteString -> [a]" function

0 Answers0