Regex with positive lookahead across multiple lines

Question

I've been trying to isolate blocks containing a certain string in TextWrangler.

Here is the sample I'm working with.

<ROW num="381">
  <TO>8549672167</TO>
  <FROM>8936742582</FROM>
  <TIME>5/10/2009 19:49:3</TIME>
  <TEXT>Blah Blah Blah</TEXT>
</ROW>
<ROW num="382">
  <TO>8549672167</TO>
  <FROM>8591903412</FROM>
  <TIME>5/10/2009 19:49:37</TIME>
  <TEXT>Hme</TEXT>
</ROW>

What I'm trying to do is isolate all multi-line blocks beginning with <ROW and ending with </ROW>that contain the digits 412in the line beginning <FROM>

So in the above example, the second block would be highlighted, but not the first.

I have no idea where to begin with is, can anybody help? Thanks, MS.

your 2nd ROW's doesn't contain 412 FROM does – Mustofa Rizwan Jan 10 '17 at 04:06 — Mustofa Rizwan, Jan 10 '17 at 04:06

Mustofa Rizwan · Accepted Answer · 2017-01-10T18:38:38.053

3

Try this:

<ROW[^<]*?>[^<]*<TO>(?=[^<]*412)[^<]*<\/TO>.*?<\/ROW>

Demo

Updated answer as per op's updated question and comment :

<ROW(?=((?!ROW).)*<FROM>\d*412\d*<\/FROM>).*?<\/ROW>

Updated Link For Explanation and Demo

edited Jan 10 '17 at 18:38

answered Jan 10 '17 at 04:24

Mustofa Rizwan

10,215
2
28
43

Hi this seems to work except that when it finds an instance, it highlights from that instance's " – user7397354 Jan 10 '17 at 12:36
Actually this works if you add a "?" after the final asterisk! Is there anyway I can do the same type of search but for the "" line. I've tried substituting FROM for TO but that doesn't yield anything. Thanks. – user7397354 Jan 10 '17 at 12:50
This seems to cause a "catastrophic backtracking". I tried to read up on that but it went straight over my head. It seems to work in the example you provided but when I paste in the rest of the sample, it provides the aforementioned error. – user7397354 Jan 10 '17 at 19:39
in your updated example, that negative lookahead within the positive lookahead blew my mind. Thanks – aman207 Nov 25 '21 at 21:33

Po Stevanus Andrianta · Answer 2 · 2017-01-10T04:23:50.243

-1

<ROW.*>[\s\n]*<TO>.*412.*<\/TO>[\w\d\s\n<>\/:]*<\/ROW>

url : http://regexr.com/3f1e7

i update the solution to contain 412 in tag TO

hope this helps

edited Jan 10 '17 at 04:23

answered Jan 10 '17 at 04:15

Po Stevanus Andrianta

712
5
11

If contains special character, then it will not work – Mustofa Rizwan Jan 10 '17 at 04:36
Sorry this does not seem to work, it seems to highlight the whole sample if it finds any instance of 412 in a line. – user7397354 Jan 10 '17 at 12:39
@Maverick_Mrt yes, wasnt perfect answer – Po Stevanus Andrianta Jan 10 '17 at 13:51
@user7397354 in what case? tested and working correctly in regexr – Po Stevanus Andrianta Jan 10 '17 at 13:51

Regex with positive lookahead across multiple lines

2 Answers2