0

I need a regex to find and insert anchor tags with a span child. e.g.

replace:

<a
            href="https://en.wikipedia.org/wiki/Edward_Seymour,_1st_Duke_of_Somerset"
            title="Edward Seymour, 1st Duke of Somerset"
            >Edward Seymour, 1st Duke of Somerset</a
          >

with:

<a
            href="https://en.wikipedia.org/wiki/Edward_Seymour,_1st_Duke_of_Somerset"
            title="Edward Seymour, 1st Duke of Somerset"
            ><span>Edward Seymour, 1st Duke of Somerset</span></a
          >

The search will be in VS Code so there will be newlines and cr to contend with. I have gotten as close with

(.+?>){1}([a-z])*
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Alan
  • 1,067
  • 1
  • 23
  • 37

1 Answers1

1

This is not perfect because you could possibly nest anchors within each other and regexes are bad about keeping tracking of nested contexts. But a good 90% solution is to start by looking for <a, and save off everything up to and including the next >. Then in a separate capturing group save everything up to and not including the next </a. The final capturing group gets the </a, some optional whitespace, and the next closing >.

string.RegExpReplace("(<a[^>]+?>)(.+?)(</a[\s\r\n]*>)", "$1<span>$2</span>$3", regexOptios.SingleLine)

The $n entries in the replace string refer to the contents of each group that was captured. Result:

<a
        href="https://en.wikipedia.org/wiki/Edward_Seymour,_1st_Duke_of_Somerset"
        title="Edward Seymour, 1st Duke of Somerset"
        ><span>Edward Seymour, 1st Duke of Somerset</span></a
      >
Chris Maurer
  • 2,339
  • 1
  • 9
  • 8
  • Thanks. it works great when trying @ regex101.com but not in Vs Code find. only anchor tags that aren't broken up into multiple lines are found. I tried adding /r/n but no good. – Alan Jun 14 '21 at 16:24
  • There are several useful flags to set to tell regex how to behave. The behavior you are describing is when the multi-line flag (/m) is set to True. Make sure it is False so it will capture newlines as part of the dot specifier. – Chris Maurer Jun 15 '21 at 02:32
  • I'm sorry I don't see how you would do that. i tried adding /m to end of expression and then got no results. – Alan Jun 15 '21 at 14:18
  • I found documentation at https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.replace?view=net-5.0. Try calling it this way regex.replace(string,"(]+?>)(.+?)()", "$1$2$3",regexOptions.SingleLine) – Chris Maurer Jun 15 '21 at 16:13