2

I am attempting to write a regex that will return a multiple line match from a log file. Using the sample below -- I want to match an entire 'transaction' which begins and ends with the same text as ALL other transactions in the log (Start and End). However - between those lines there is a custom identifier -- in this case an email address that will differentiate one transaction from another.

Start of a transaction.
random line 1.
random line 2.
email1@gmail.com
End of a transaction.
Start of a transaction.
random line 1.
random line 2.
email1@yahoo.com
random line 3.
End of a transaction.

Here is what I am starting with:

^Start(.*?)\n(((.*?)(email1\@gmail\.com)(.*?)|(.*?))\n){1,}End (.*?)\n

Essentially - I want to say: Begin with 'Start' -- and match all lines until an 'End' line, but only return a match if one of the lines contains a particular email address.

Right now -- my regex treats the entire log file as a single match since presumably line 1 contains a 'Start' and line X contains an 'End' and somewhere in the hundreds of lines in between -- their is a match for the email. Also -- application is Powershell and will be using a Select-String pattern, if that matters.

tresstylez
  • 1,809
  • 6
  • 29
  • 41

2 Answers2

4

Use a negative lookahead assertion to make sure your regex never matches across an "End of transaction" boundary:

preg_match_all(
    '/^                                # Start of line
    Start\ of\ a\ transaction\.        # Match starting tag.
    (?:                                # Start capturing group.
     (?!End\ of\ a\ transaction)       # Only match if we\'re not at the end of a tag.
     .                                 # Match any character
    )*                                 # any number of times.
    email1@gmail\.com                  # Match the required email address
    (?:(?!End\ of\ a\ transaction).)*  # and the rest of the tag.
    ^                                  # Then match (at the start of a line)
    End\ of\ a\ transaction\.\n        # the closing tag./smx', 
    $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];

Test it live on regex101.com.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

Use s modifier to make . match newline characters:

(?s)Start((?!Start).)*email1\@gmail\.com(.*?)End([^\n]*)

Note: ((?!Start).)* asserts a negative lookahead at each position which we stepped into by * modifier to ensure that we are in one block at a single time.

Live demo

revo
  • 47,783
  • 14
  • 74
  • 117
  • Lazy quantifiers aren't enough to keep the regex from crossing "End of transaction" boundaries: https://regex101.com/r/mU4vW8/2 – Tim Pietzcker May 12 '16 at 06:31
  • @TimPietzcker That's because you are using `g` modifier, which has to do its best job. – revo May 12 '16 at 06:34
  • No. The `g` modifier means "Find all matches instead of just the first one". – Tim Pietzcker May 12 '16 at 06:34
  • @TimPietzcker Yes and you shouldn't do that. OP says there is an **identifier** (email) but you are duplicating an identifier in provided text input. – revo May 12 '16 at 06:36
  • OK, I wasn't sure from the question if there will always be exactly one match - you're quite possibly right that that's the case. But still, your solution has the same problem OP is describing - if the item you're looking for is not the very first one in the file, the regex will match too much. https://regex101.com/r/mU4vW8/3 illustrates my point better than my first comment. – Tim Pietzcker May 12 '16 at 06:38
  • @TimPietzcker You nearly destroyed my RegEx by adding your desired modifiers (`g` and `m`) which I didn't offer in my answer. If you encounter a problem in my exact RegEx let me know. I'll modify it then. – revo May 12 '16 at 06:41
  • Without the `m` modifier, `^` only matches at the start of the entire string - and we *know* that the text we're looking for is most likely not at the start of string. Read the last paragraph of the question. – Tim Pietzcker May 12 '16 at 06:43
  • @TimPietzcker You were right about caret. I updated the answer. – revo May 12 '16 at 07:36