3

I've been trying to isolate blocks containing a certain string in TextWrangler.

Here is the sample I'm working with.

<ROW num="381">
  <TO>8549672167</TO>
  <FROM>8936742582</FROM>
  <TIME>5/10/2009 19:49:3</TIME>
  <TEXT>Blah Blah Blah</TEXT>
</ROW>
<ROW num="382">
  <TO>8549672167</TO>
  <FROM>8591903412</FROM>
  <TIME>5/10/2009 19:49:37</TIME>
  <TEXT>Hme</TEXT>
</ROW>

What I'm trying to do is isolate all multi-line blocks beginning with <ROW and ending with </ROW>that contain the digits 412in the line beginning <FROM>

So in the above example, the second block would be highlighted, but not the first.

I have no idea where to begin with is, can anybody help? Thanks, MS.

2 Answers2

3

Try this:

<ROW[^<]*?>[^<]*<TO>(?=[^<]*412)[^<]*<\/TO>.*?<\/ROW>

Demo

enter image description here

Updated answer as per op's updated question and comment :

<ROW(?=((?!ROW).)*<FROM>\d*412\d*<\/FROM>).*?<\/ROW>

Updated Link For Explanation and Demo

Mustofa Rizwan
  • 10,215
  • 2
  • 28
  • 43
  • Hi this seems to work except that when it finds an instance, it highlights from that instance's " – user7397354 Jan 10 '17 at 12:36
  • Actually this works if you add a "?" after the final asterisk! Is there anyway I can do the same type of search but for the "" line. I've tried substituting FROM for TO but that doesn't yield anything. Thanks. – user7397354 Jan 10 '17 at 12:50
  • This seems to cause a "catastrophic backtracking". I tried to read up on that but it went straight over my head. It seems to work in the example you provided but when I paste in the rest of the sample, it provides the aforementioned error. – user7397354 Jan 10 '17 at 19:39
  • in your updated example, that negative lookahead within the positive lookahead blew my mind. Thanks – aman207 Nov 25 '21 at 21:33
-1
<ROW.*>[\s\n]*<TO>.*412.*<\/TO>[\w\d\s\n<>\/:]*<\/ROW>

url : http://regexr.com/3f1e7

i update the solution to contain 412 in tag TO

hope this helps