0

I want to add markdown to key phrases in a (gollum) wiki page that will link to the relevant wiki page in the form:

This is the key phrase.

Becomes

This is the [[key phrase|Glossary#key phrase]].

I have a list of key phrases such as:

keywords = ["golden retriever", "pomeranian", "cat"]

And a document:

Sue has 1 golden retriever. John has two cats.
Jennifer has one pomeranian. Joe has three pomeranians.

I want to iterate over every line and find every match (that isn't already a link) for each keyword. My current attempt looks like this:

File.foreach(target_file) do |line|
    glosses.each do |gloss|
        len = gloss.length
        # Create the regex. Avoid anything that starts with [
        # or (, ends with ] or ), and ignore case.
        re = /(?<![\[\(])#{gloss}(?![\]\)])/i
        # Find every instance of this gloss on this line.
        positions = line.enum_for(:scan, re).map {Regexp.last_match.begin(0) }
        positions.each do |pos|
            line.insert(pos, "[[")
            # +2 because we just inserted 2 ahead.
            line.insert(pos+len+2, "|#{page}\##{gloss}]]")
        end
    end
    puts line
end

However, this will run into a problem if there are two matches for the same key phrase on the same line. Because I insert things into the line, the position I found for each match isn't accurate after the first one. I know I could adjust for the size of my insertions every time but, because my insertions are a different size for each gloss, it seems like the most brute-force, hacky solution.

Is there a solution that allows me to make multiple insertions on the same line at the same time without several arbitrary adjustments each time?

Nich Del
  • 135
  • 8
  • 1
    Like [this](https://regex101.com/r/qY5zV6/2)? – Bryce Drew Jun 30 '16 at 22:13
  • @BryceDrew Thanks for the response. That seems mostly correct, but it doesn't do any look ahead or behind assertions, which would prevent [adding the link to existing links](https://regex101.com/r/cU3qI1/3). Ideally my script would be ran on a document after it was manually updated, to add new links (without messing with existing ones). – Nich Del Jun 30 '16 at 22:35
  • @BryceDrew I've found my answer, largely based on your example. Many thanks! – Nich Del Jun 30 '16 at 23:02

1 Answers1

2

After looking at @BryceDrew's online python version, I realized ruby probably also has a way to fill in the match. I now have a much more concise and faster solution.

First, I needed to make regexes of my glosses:

glosses.push(/(?<![\[\(])#{gloss}(?![\]\)])/i)

Note: The majority of that regex is look-ahead and look-behind assertions to prevent catching a phrase that's already part of a link.

Then, I needed to make a union of all of them:

re = Regexp.union(glosses)

After that, it's as simple as doing gsub on every line, and filling in my matches:

File.foreach(target_file) do |line|
  line = line.gsub(re) {|match| "[[#{match}|Glossary##{match.downcase}]]"}
  puts line
end
Nich Del
  • 135
  • 8
  • 1
    You probably want to put a word boundary on either side of your regular expression to avoid catching e.g. "catapult" for "cat." Something like this: `re = /\b#{Regexp.union(glosses)}\b/`. – Jordan Running Jul 01 '16 at 01:20
  • @Jordan I've thought about this but I do want to catch plurals and verb endings, so it's a trade between false negatives and false positives. – Nich Del Jul 01 '16 at 01:32