3

I'm trying to match the last occurrences of a string from a log file.

[03/03/2019 09:16:36] Moving message 123456789 from NEW to PENDING
[03/03/2019 09:16:36] Retrieving file(s) of type DATAWAREHOUSE for 123456
[03/03/2019 09:16:36] collecting warehouse version 7.3.1 files for 123456...
[03/03/2019 09:16:37] Moving message 123456789 from NEW to PENDING
[03/03/2019 09:16:37] Retrieving file(s) of type DATAWAREHOUSE for 123456
[03/03/2019 09:16:37] collecting warehouse version 7.3.1 files for 123456...
[03/03/2019 09:16:38] Moving message 123456789 from NEW to PENDING
[03/03/2019 09:16:39] Retrieving file(s) of type DATAWAREHOUSE for 123456
[03/03/2019 09:16:40] collecting warehouse version 7.3.1 files for 123456...

Above is the sample log file from which there are three occurrences of the below string,

Moving message 123456789 from NEW to PENDING

I need to match the last occurrence to get the respective timestamp "[03/03/2019 09:16:38]". But when all these are in a single line using greedy approach (.*) it works fine. But when they are present in multiple lines it isn't working. I haven't tried multiline (m) as I'm not sure how to use it. Can someone please help me construct the regex query to retrive this last occurrence timestamp? Example: https://regex101.com/r/fnwPsB/1

tuxian
  • 159
  • 5
  • 12
  • Perhaps, like `(?s:.*\n)?\K\[\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2}\] Moving message 123456789 from NEW to PENDING`? See https://regex101.com/r/fnwPsB/2 – Wiktor Stribiżew Mar 05 '19 at 10:57
  • 1
    See this: https://regex101.com/r/fnwPsB/3 – anubhava Mar 05 '19 at 10:58
  • 1
    Both works great! thank you so much. @anubhava 's is exactly what I need. Thanks both of you! – tuxian Mar 05 '19 at 11:03
  • FYI: To get a substring out of a whole match, you need to use [capturing groups](https://www.regular-expressions.info/brackets.html), so the `\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2}` pattern should be enclosed with capturing parentheses. – Wiktor Stribiżew Mar 05 '19 at 11:30

2 Answers2

1

You may use

(?s:.*\n)?\K\[(\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2})\] Moving message 123456789 from NEW to PENDING

See the regex demo

Details

  • (?s:.*\n)? - an inline modifier group that matches any 0+ chars as many as possible up to the last LF char that is followed with the last occurrence of the subsequent patterns.
  • \K - match reset operator removing all text matched so far from the match memory buffer
  • \[(\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2})\] Moving message 123456789 from NEW to PENDING - the specific line pattern to get with the datetime captured in Group 1.

Alternatively, use

(?s)(\[\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2}\] Moving message 123456789 from NEW to PENDING)(?!.*(?1))

See this regex demo.

Details

  • (?s) - DOTALL modifier making . match any char
  • (\[(\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2})\] Moving message 123456789 from NEW to PENDING) - the necessary pattern to match captured into Group 1 and the datetime in Group 2
  • (?!.*(?1)) - a negative lookahead that fails the match if there is the same pattern as defined in Group 1 after any 0+ chars to the right of the current position.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Here is a solution that is not dependent on PCRE feature using negative lookahead:

(?s)\[(\d{2}\/\d{2}\/\d{4} \d{2}:\d{2}:\d{2})\] Moving message 123456789 from NEW to PENDING(?!.* Moving message 123456789 from NEW to PENDING)

RegEx Demo

Date-time is available in 1st capture group.

Here (?!.* Moving message 123456789 from NEW to PENDING) is negative lookahead that ensures we match very last occurrence of given pattern.

anubhava
  • 761,203
  • 64
  • 569
  • 643