I'm trying to hyperlink 400 or so keywords in a 50,000 word markdown document.
This is one of several steps in a Perl "build chain", so it would be ideal to achieve the hypelinking in Perl also.
I have a separate file contain all the keywords, and mapping each to a markdown fragment which it should be replaced with, like this:
keyword::(keyword)[#heading-to-jump-to]
The above example implies that wherever "keyword" occurs in the source markdown document, it should be replaced by the markdown fragment "(keyword)[#heading-to-jump-to]".
Ignoring keywords that occur as substrings of other keywords, plural/singular forms, and ambiguous keywords, it's reasonably straightforward. But naturally, there are two additional constraints.
I need to match only instances of keyword which are:
- Not on a line not beginning #
- Not most directly below The Heading To Jump To
The plain English meaning of these is: don't match keywords in any headings, and don't replace keywords that are under the heading they would link to.
My Perl script reads the $keyword::$link pairs and then, pair by pair, substitutes them into a regex, and then searches/replaces the document with that regex.
I've written a regex that does the matching (for the cases I've manually tested so far) using Regex Buddy's JGSoft regex implementation. It looks like this:
Frog::(Frog)[#the-frog)
-->
([Ff]rog'?s?'?)(?=[\.!\?,;: ])(?<!#+ [\w ]*[Ff]rogs?)(?<!#+ the-frog)(?<!#+ the-frog[^#]*)
The problem (or, perhaps, a problem) with this it that it uses variable length lookbacks which are not supported by Perl. So I can't even test this regex on the full document to see if it really works.
I've read a bunch of other posts on how to work around variable length lookbacks, but I can't seem to get it right for my particular case. Can any of the resident regex wizards help out with a neater regex that will execute in Perl?