0

The following regex:

(?!<script[^>]*>)[(.*?)](?![^<]*<\/script>)

Targets every [TEXT] and [INPUT] there is in the input string, except any [] within a script tag.

I would now like to change this, to have the exception to be on a specific script with id="special" instead.

So <script id="special">[INPUT]</script> should not be targeted while another script tag without the id special, like <script>[INPUT]</script> should together with the rest of the string.

I tried adding id="special" to the above regex before [^>]*>, but doesn't work.

Karem
  • 17,615
  • 72
  • 178
  • 278
  • Why the -1 anyone? I would like to improve, but I have to know why? – Karem Jun 24 '17 at 09:33
  • @chris85 Thanks for your comment. The format is consistent, but it should "skip" match with all script tags that has id="special". Tried your regex, works, although it doesn't match a new line with [INPUT] only (not wrapped in – Karem Jun 24 '17 at 12:18
  • So should kind of be a exception to the regex matching. Everything inside this script should not be matched. I start to think i explain pretty bad. Hope you understand. – Karem Jun 24 '17 at 12:19
  • Ah that is great! Works! Could this be improved/cleaned up? Not experienced with regex, but seems overdo with a boolean? Also, could you submit this as an answer. Would be great to also add to your comment regarding the HTML being unreliable - why (maybe example? read more?) – Karem Jun 24 '17 at 12:28
  • Do you mean you want to first test if the string has `[]` in it before performing the regex? I've posted an answer for the initial question. – chris85 Jun 24 '17 at 12:41

2 Answers2

0

You might be going to complex on this.

If you don't want to match a <script> element that has any attributes you could use \s for whitespace:

<\s*script\s*>\[(.*?)\]</\s*script\s*>

If the only attribute you need to omit is 'id' you could use a negative lookahead/lookbehind:

<script(?!.*\sid=).*>\[(.*?)\]</script>

That will match <script NOT FOLLOWED by <whitespace>id= before the > character. For More Help Visit this Link

always-a-learner
  • 3,671
  • 10
  • 41
  • 81
  • Thanks for your contribution. It doesn't match anything with your second solution that are what I would like to accomplish: http://regexr.com/3g7qk – Karem Jun 24 '17 at 09:33
0

You can skip everything inside a script element with that id by using the PCRE verbs skip and fail.

<script id="special">.*?<\/script>(*SKIP)(*FAIL)|\[[^\]]+?\]

Demo: https://regex101.com/r/PSMV15/5/

You can read more about this here, http://www.rexegg.com/backtracking-control-verbs.html#skipfail.

If a string is HTML a parser should be used because there can be all sorts of variations in the elements and attributes.

For example:

<script  id="special">
<script src="page" id="special">
<script src="page" id="special" class="why?">
<script id='special'>
<script id=special>
<script id=special src=page>

without even entering the layered elements issue. Here's one thread on why regexs and HTML shouldn't go together. RegEx match open tags except XHTML self-contained tags

chris85
  • 23,846
  • 7
  • 34
  • 51
  • Thank you for this! Great! Lastly, the $1 is empty how can I solve this? I tried modifying the regex to: – Karem Jun 24 '17 at 23:04
  • 1
    There is no capture group so it is `$0`. The example you linked has capture group 1.. Where's the decimal entity and is that optional? – chris85 Jun 24 '17 at 23:06
  • My bad, got it working! In the original script, I was modifying the id="special" to also accept special2, but forgot ?: to only group and not capture. – Karem Jun 24 '17 at 23:11
  • Not sure I know what you mean but sounds like it is working for you.. so hooray – chris85 Jun 24 '17 at 23:22