0

I seem to have to perpetually relearn Regex & Grep syntax every time I need something advanced. This time, even with BBEDIT's pattern playground, I can't work this one out.

I need to do a multi-line search for the occurrence of two literal asterisks anywhere in the text between a pair of tags in a plist/XML file.

I can successfully construct a lookbetween so:

(?s)(?<=<array>).*?(?=</array>)

I try to limit that to only match occurrences in which two asterisks appear between tags:

(?s)(?<=<array>).*?[*]{2}.*?(?=</array>)
(?s)(?<=<array>).+[*]{2}.+(?=</array>)
(?s)(?<=<array>).+?[*]{2}.+?(?=</array>)

But they find nought. And when I remove the {2} I realize I'm not even constructing it right to find occurrences of one asterisk. I tried escaping the character /* and [/*] but to no avail.

How can i match any occurrence of blah blah * blah blah * blah blah ?

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
brianfit
  • 1,829
  • 1
  • 19
  • 34

2 Answers2

2

[*]{2} means the two asterisks must be consecutive.

(.*[*]){2} is what you're looking for - it contains two asterisks, with anything in between them.

But we also need to make sure the regex is only testing one tag closure at the same time, so instead of .*, we need to use ((?!<\/array>).)* to make sure it won't consume the end tag </array> while matching .*

The regex can be written as:

(?s)(?<=<array>)(?:((?!<\/array>).)*?[*]){2}(?1)*

See the test result here

Hao Wu
  • 17,573
  • 6
  • 28
  • 60
1

Use

(?s)(?<=<array>)(?:(?:(?!<\/?array>)[^*])*[*]){2}.*?(?=</array>)

See proof.

Explanation

NODE EXPLANATION
(?s) set flags for this block (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally)
(?<= look behind to see if there is:
  <array> '<array>'
) end of look-behind
(?: group, but do not capture (2 times):
(?: group, but do not capture (0 or more times (matching the most amount possible)):
(?! look ahead to see if there is not:
</?array> </array> or <array>
) end of look-ahead
[^*] any character except: '*'
)* end of grouping
[*] any character of: '*'
){2} end of grouping
.*? any character (0 or more times (matching the least amount possible))
(?= look ahead to see if there is:
</array> '</array>'
) end of look-ahead
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • This is a slightly less compact expression than Hao Wu's answer, which I had already accepted, but it too works and I truly appreciate the effort you put in to break down the logic element by element. If I am reading this right, it's slightly more efficient than Hao Wu's in that it groups and eliminates any tag pair with NO askerisk in it [^*] before testing for two occurrences. Did I get that right? – brianfit Jan 26 '21 at 08:41
  • 1
    @brianfit This solution is more precise as it does not overlap array tags not having double asterisks. Only the tags with double asterisks will get matched thanks to a tempered greedy token. It is not that fast, but makes matching more precise. Cf. [Wu's](https://regex101.com/r/i4Muic/1) and [mine](https://regex101.com/r/i4Muic/2) – Ryszard Czech Jan 26 '21 at 21:50