Regex with tag works, but not with specific id?

Question

The following regex:

(?!<script[^>]*>)[(.*?)](?![^<]*<\/script>)

Targets every [TEXT] and [INPUT] there is in the input string, except any [] within a script tag.

I would now like to change this, to have the exception to be on a specific script with id="special" instead.

So <script id="special">[INPUT]</script> should not be targeted while another script tag without the id special, like <script>[INPUT]</script> should together with the rest of the string.

I tried adding id="special" to the above regex before [^>]*>, but doesn't work.

Why the -1 anyone? I would like to improve, but I have to know why? — Karem, Jun 24 '17 at 09:33
@chris85 Thanks for your comment. The format is consistent, but it should "skip" match with all script tags that has id="special". Tried your regex, works, although it doesn't match a new line with [INPUT] only (not wrapped in — Karem, Jun 24 '17 at 12:18
So should kind of be a exception to the regex matching. Everything inside this script should not be matched. I start to think i explain pretty bad. Hope you understand. — Karem, Jun 24 '17 at 12:19
Ah that is great! Works! Could this be improved/cleaned up? Not experienced with regex, but seems overdo with a boolean? Also, could you submit this as an answer. Would be great to also add to your comment regarding the HTML being unreliable - why (maybe example? read more?) — Karem, Jun 24 '17 at 12:28
Do you mean you want to first test if the string has `[]` in it before performing the regex? I've posted an answer for the initial question. — chris85, Jun 24 '17 at 12:41

score 0 · Answer 1 · edited Jun 24 '17 at 03:55

0

You might be going to complex on this.

If you don't want to match a <script> element that has any attributes you could use \s for whitespace:

<\s*script\s*>\[(.*?)\]</\s*script\s*>

If the only attribute you need to omit is 'id' you could use a negative lookahead/lookbehind:

<script(?!.*\sid=).*>\[(.*?)\]</script>

That will match <script NOT FOLLOWED by <whitespace>id= before the > character. For More Help Visit this Link

edited Jun 24 '17 at 03:55

always-a-learner

3,671
10
41
81

answered Jun 24 '17 at 03:24

Curtis Boyden

96
6

Thanks for your contribution. It doesn't match anything with your second solution that are what I would like to accomplish: http://regexr.com/3g7qk – Karem Jun 24 '17 at 09:33

chris85 · Accepted Answer · 2017-06-24T12:49:06.787

0

You can skip everything inside a script element with that id by using the PCRE verbs skip and fail.

<script id="special">.*?<\/script>(*SKIP)(*FAIL)|\[[^\]]+?\]

Demo: https://regex101.com/r/PSMV15/5/

You can read more about this here, http://www.rexegg.com/backtracking-control-verbs.html#skipfail.

If a string is HTML a parser should be used because there can be all sorts of variations in the elements and attributes.

For example:

<script  id="special">
<script src="page" id="special">
<script src="page" id="special" class="why?">
<script id='special'>
<script id=special>
<script id=special src=page>

without even entering the layered elements issue. Here's one thread on why regexs and HTML shouldn't go together. RegEx match open tags except XHTML self-contained tags

edited Jun 24 '17 at 12:49

answered Jun 24 '17 at 12:40

chris85

23,846
7
34
51

Thank you for this! Great! Lastly, the $1 is empty how can I solve this? I tried modifying the regex to: – Karem Jun 24 '17 at 23:04
1

There is no capture group so it is `$0`. The example you linked has capture group 1.. Where's the decimal entity and is that optional? – chris85 Jun 24 '17 at 23:06
My bad, got it working! In the original script, I was modifying the id="special" to also accept special2, but forgot ?: to only group and not capture. – Karem Jun 24 '17 at 23:11
Not sure I know what you mean but sounds like it is working for you.. so hooray – chris85 Jun 24 '17 at 23:22

Regex with tag works, but not with specific id?

2 Answers2