I am working on a regex for matching all quotes (double " and single ' alike), which must have specific characters in front of them and will end upon reaching the same quote type or when encountering an HTML comment (<!--).
The rules of the game are:
- The HTML tag names themselves (e.g. "<a>") do not matter; the regex only takes the input from the attributes of the HTML element (<a all="of this in bold is the attribute section">)
- The regex must be able to find both single quotes (
'
) and double quotes ("
), but only escape upon reaching it's own quote type (\1
), an HTML commenct (<!--
) or end of input ($
). - If an HTML comment is encountered the quote will be interrupted, but is still considered a quote: <a id="works <!-- inpterrupted -->
- Only a specific set of characters must exist before the first quote, following this pattern: [^\w!#£¤€´¨-.:]
See this regex:
/[^\w!#£¤€´¨\-.:]('|")(.|\n)*?(\1|<!--|$)/
There's a problem in the DISALLOW area (at regexpal), though. The regex should never work here as the first characters are disallowed.
Thanks in advance for your help!
Clarification
Example here on regexpal.com. Everything - except the content under the DISALLOW section - is correct.
The desired result should be as follows. Bold indicates a match using the regex display above. The (many) HTML comments are there to end the HTML tags in a consistent way.
ALLOW
- <a '' <!-- -->
- <a $'' <!-- -->
- <a %'' <!-- -->
- <a &'' <!-- -->
- <a /'' <!-- -->
- <a ('' <!-- -->
- <a )'' <!-- -->
- <a {'' <!-- -->
- <a }'' <!-- -->
- <a ['' <!-- -->
- <a ]'' <!-- -->
- <a ='' <!-- -->
- <a ?'' <!-- -->
- <a +'' <!-- -->
- <a `'' <!-- -->
- <a |'' <!-- -->
- <a ^'' <!-- -->
- <a ~'' <!-- -->
- <a *'' <!-- -->
- <a ,'' <!-- -->
- <a ;'' <!-- -->
- <a <'' <!-- -->
- <a \'' <!-- -->
DISALLOW
- <a a'' <!-- -->
- <a 9'' <!-- -->
- <a !'' <!-- -->
- <a #'' <!-- -->
- <a £'' <!-- -->
- <a ¤'' <!-- -->
- <a €'' <!-- -->
- <a ´'' <!-- -->
- <a ¨'' <!-- -->
- <a -'' <!-- -->
- <a _'' <!-- -->
- <a .'' <!-- -->
- <a :'' <!-- -->
WITH BOTH QUOTE TYPES
- <a single ='hey' double ="you" <!-- -->
STOP AT HTML QUOTE
- <a =' <!-- this will break both the quotation and the HTML tag -->
END OF INPUT
<a ='
this - on a new line - is still part of the quote