1

I'm very new to Haskell. I'd like to be able to find some color expressions in a string. So let's say I have this list of expressions:

colorWords = ["blue", "green", "blue green"]

And I want to be able to get the locations of all of those, anywhere in a string, even if it's broken up by a linebreak, or if a hyphen separates it instead. So given a string like:

First there was blue     
and then there was Green,     
and then blue    
green all of a sudden, and not to mention blue-green

It should give the character offsets for "blue" (line one), "green" (line two), and "blue green" (lines 3-4) and "blue-green" (line 4), something like:

[("blue", [20]), ("green", [40]), ("blue green", [50, 65])]

I can do this with regexes, but I've been trying to do it with a parser just as an exercise. I'm guessing it's something like:

import Text.ParserCombinators.Parsec

separator = spaces <|> "-" <|> "\n"

colorExp colorString = if (length (words colorString))>1 then 
  multiWordColorExp colorString
  else colorString

multiWordColorExp :: Parser -> String
multiWordColorExp colorString = do
  intercalate separator (words colorString)

But I have no idea what I'm doing, and I'm not really getting anywhere with this.

Jonathan
  • 10,571
  • 13
  • 67
  • 103
  • 1
    String search algorithms don't require a parser - is there a reason you're looking to solve this using parsec? Look up Boyer-Moore, for example, or Knuth Morris Pratt. If performance doesn't matter then you can just do the naive thing and check isPrefixOf, then drop a character and repeat. – Thomas M. DuBuisson Aug 27 '19 at 08:21
  • @ThomasM.DuBuisson The OP does say: ‘I’ve been trying to do it with a parser just as an exercise’. – bradrn Aug 27 '19 at 09:11
  • 2
    Parsers are most suitable when you can make sense of all the input data. They're not as great at extracting a little signal from a lot of noise. Certainly it's still possible, but it's not really their strong suit. – amalloy Aug 27 '19 at 16:11
  • For turning a list of strings into a parser, you might like [this question](https://stackoverflow.com/q/34356668/791604). It isn't hard to turn it case-insensitive and spaces-or-hyphens-insensitive; then just use `sepBy` or similar to iterate it. – Daniel Wagner Aug 27 '19 at 17:05
  • This is a good question and should not be downvoted. It is totally natural to want to do pattern-matching search with parsers, and good answers exist. – James Brock Sep 02 '19 at 03:41

1 Answers1

1

We can find substring locations with a parser by using the sepCap combinator from replace-megaparsec.

Here's a solution to your example problem. Requires packages megaparsec, replace-megaparsec, containers. References: string' choice getOffset try from Megaparsec.

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Maybe
import Data.Either
import Data.Map.Strict as Map

let colorWords :: Parsec Void String (String, [Int])
    colorWords = do
            i <- getOffset
            c <- choice
                [ try $ string' "blue" >>
                        anySingle >>
                        string' "green" >>
                        pure "blue green"
                , try $ string' "blue" >> pure "blue"
                , try $ string' "green" >> pure "green"
                ]
            return (c,[i])

input = "First there was blue\nand then there was Green,\nand then blue\ngreen all of a sudden, and not to mention blue-green"

Map.toList $ Map.fromListWith mappend $ rights $ fromJust
    $ parseMaybe (sepCap colorWords) input
[("blue",[16]),("blue green",[103,56]),("green",[40])]
James Brock
  • 3,236
  • 1
  • 28
  • 33