0

I am wanting to search for {{ upc }} and start the capture not from the <div immediately ahead of the match but the 2nd <div ahead of the match i.e. <div class="form-group"> and capture not up to the first </div> after the match but the 2nd i.e closing </div> or up to the start of the next <div class="form-group"> (depending on how you look at it)

Here is the sample HTML/Twig template I am wanting to search and replace.

<div class="form-group">
    <label class="col-sm-2 control-label" for="input-sku"><span data-toggle="tooltip" title="{{ help_sku }}">{{ entry_sku }}</span></label>
    <div class="col-sm-10">
        <input type="text" name="sku" value="{{ sku }}" placeholder="{{ entry_sku }}" id="input-sku" class="form-control"/>
    </div>
</div>
<div class="form-group">
     <label class="col-sm-2 control-label" for="input-upc"><span data-toggle="tooltip" title="{{ help_upc }}">{{ entry_upc }}</span></label>
     <div class="col-sm-10">
         <input type="text" name="upc" value="{{ upc }}" placeholder="{{ entry_upc }}" id="input-upc" class="form-control"/>
     </div>
</div>
<div class="form-group">
     <label class="col-sm-2 control-label" for="input-ean"><span data-toggle="tooltip" title="{{ help_ean }}">{{ entry_ean }}</span></label>
     <div class="col-sm-10">
         <input type="text" name="ean" value="{{ ean }}" placeholder="{{ entry_ean }}" id="input-ean" class="form-control"/>
     </div>
</div>

The expected regex match is as follows:

<div class="form-group">
     <label class="col-sm-2 control-label" for="input-upc"><span data-toggle="tooltip" title="{{ help_upc }}">{{ entry_upc }}</span></label>
     <div class="col-sm-10">
         <input type="text" name="upc" value="{{ upc }}" placeholder="{{ entry_upc }}" id="input-upc" class="form-control"/>
     </div>
</div>

All help appreciated. Thank you.

Trent Renshaw
  • 512
  • 7
  • 14

2 Answers2

0

You need to parse the div's you want and then absorb everything inside them and exclude the rest.

[\w\W] means match words and non-words. It matches newline characters for instance, which * does not.

[\w\W]*(<div[\w\W]*?<div[\w\W]*?{{ sku }}[\w\W]*?<\/div>[\w\W]*?<\/div>)[\w\W]*

samthegolden
  • 1,366
  • 1
  • 10
  • 26
  • thanks but it's far too greedy with too many steps. The HTML/Twig code is just a small part of the whole document. Unfortunately this regex matches everything before and after `{{ sku }}` In the end I decided to start capturing from left to right starting with the opening `
    ` matching `{{ sku }}` (non-capturing) in the middle up until the next starting `
    ` as follows: `(
    .*(?:{{ sku }}).*)(?:
    )`
    – Trent Renshaw Apr 20 '20 at 09:13
  • You said you wanted to capture 2 div's ahead of the chosen word... – samthegolden Apr 20 '20 at 09:26
  • And btw that does not match your example: https://regex101.com/r/QkK8LY/1 – samthegolden Apr 20 '20 at 09:28
  • You are using `/gm` modifier not `/s` for dot to match newline characters example: https://regex101.com/r/yFPT9o/1 .. The regex still is not the solution as it will start match from the first `
    ` through to `{{ sku }}`. I think this will be a task for look ahead and look behind.
    – Trent Renshaw Apr 20 '20 at 10:32
  • How does my regex not fulfil your requirements? If you explain better I may improve it – samthegolden Apr 20 '20 at 10:40
  • see your regex and test string in v2 https://regex101.com/r/QkK8LY/2 .. I have expanded the test string to give more context / clarity – Trent Renshaw Apr 20 '20 at 11:42
  • Check my edit @TrentRenshaw. I was not matching spaces and variations... This way you don't need to specify an initial pattern and don't need to use lookaheads, but use non-greedy capture. – samthegolden Apr 20 '20 at 12:45
0

One thing you can try is to use a negative lookahead to filter out the things you do not wish to be included in your match. For instance, matching a <div, followed by anything and then another <div, can match things like <div></div><div>.

Instead, what you can say is to match <div, followed by anything - as long as it is not </div> - and then another <div.

<div    (?:(?!</div>).)*    <div

Then, you can insert that same subpattern anywhere in your expression where you'd normally write .*. In this particular case, you can repeat that to make sure you're not hitting a closing div before the UPC and then continue with the {{ UPC }} portion.

<div(?:(?!</div>).)*<div    (?:(?!</div>).)*    {{ upc }}    .*?</div>\s*</div>

Here is a demo

Quixrick
  • 3,190
  • 1
  • 14
  • 17