regex to search pattern and output multiple lines until another pattern

Question

I have a log file, where every log follows a pattern:
Date [FLAG] LogRequestID : Content

The Content part of each log might span multiple lines. Given a LogRequestID, I need to search for all occurrences, and get the entire log. I need this to be done using either perl, awk, sed or pcregrep.

Example input ( Note there is no empty line between the logs):

24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,828 [INFO] 567890 (Blah : Blah1) Service-name:: Content( May span multiple lines)

24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. Printing Object[ ID1=fac-adasd ID2=123231
ID3=123108 Status=Unknown
Code=530007 Dest=CA
]

24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,831 [INFO] 567890 (Blah : Blah2) Service-name:: Content( May span multiple lines)

Given the search key 123456 I want to extract the following:

24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. Printing Object[ ID1=fac-adasd ID2=123231
ID3=123108 Status=Unknown
Code=530007 Dest=CA
]

24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

Using grep gives me the single line logs, but only gives me part of the multi-line logs.

I tried checking for few lines after the search pattern, using awk, and checking if another log is reached, but it becomes to inefficient. I need some sort of regex that can be used with pcregrep or perl or even awk, to fetch this output.

Please help me out as I'm pretty bad with regular expressions.

We get requests for write-a-regex-for-me several times every day - readers should be making it clear that Stack Overflow is not a clearing house for free labour. Please _always_ make an effort before posting. — halfer, Jun 13 '17 at 19:02
@halfer I did make an effort. I just didn't include it in my question. Also I wasn't aware of awk's filter{action} method and therefore I thought that I need some complicated multiline recognizing regex, hence the question. I will definitely keep in mind, to include my efforts along with the question next time. — gitmorty, Jun 14 '17 at 10:48

JFS31 · Accepted Answer · 2017-06-13T12:06:47.547

0

How about that:

awk '/[0-9]{2}[[:space:]][[:alnum:]_]+[[:space:]][0-9]{4}/{ n = 0 }/123456/{ n = 1 }n' file

Output:

    24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

    24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. Printing Object[ ID1=fac-adasd ID2=123231
    ID3=123108 Status=Unknown
    Code=530007 Dest=CA
    ]

    24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

The regex in the beginning is matching the Date at the start of each entry and is setting n to zero. But when there is your desired ID in the line n is set to one and everything is printed until the next date.

edited Jun 13 '17 at 12:06

answered Jun 13 '17 at 08:07

JFS31

518
5
13

Thanks for answering. But the awk script isn't working. It's never going into the first filter's action( i.e setting n=0) and thus its printing everything. I can't seem to figure out where the error is, in the regex. Can you please recheck? – gitmorty Jun 13 '17 at 09:41
Mhhhh, for me it works. Maybe it is due to our awk version. Because with the regex it should be fine. You can check here https://regex101.com/. The regex is matching the date format 24 May 2017. – JFS31 Jun 13 '17 at 10:37
@JFS31 Thanks. I'm working on a mac and it has BSD awk, and as Ed mentioned, I used the POSIX char classes and it's working. – gitmorty Jun 13 '17 at 11:34
1

@EdMorton Thank you for the input. That was helpful :) – JFS31 Jun 13 '17 at 11:36
@EdMortonThanks. One problem I face now is with the { } meta. I've even tried GNU awk, but there doesn't seem to be a way to specify the count of the char class. For now I'm using +. Let me know if there's a way to make it work. – gitmorty Jun 13 '17 at 11:37
@AkhilAvinash Regexp intervals (`{}`) are part of POSIX so if your awk doesn't support it then get a new awk. In very old versions of gawk you had to enable it by `gawk --re-interval 'script'` so try that and if it works then, again, get a new version of gawk as that would mean yours is years behind and you're missing a ton of extremely useful functionality. If that doesn't work for you then I must not understand what you are asking about so provide a code segment. – Ed Morton Jun 13 '17 at 11:46
1

@EdMorton Mistake! I was to fast in changing it. Thx again. – JFS31 Jun 13 '17 at 12:02
Not supposed to be. I guess when I added the POSIX char classes I messed up the copying. – JFS31 Jun 13 '17 at 12:08
@EdMorton Here's the code: When I use : `gawk '/[0-9]{2}[[:space:]][[:alpha:]]+[[:space:]][0-9]{4}/{print}' test.txt` There is no output where as when I use: `gawk '/[0-9]+[[:space:]][[:alpha:]]+[[:space:]][0-9]+/{print}' test.txt` The regex recognizes the date and gives the output. I even tried the command you mentioned but it still doesn't work. – gitmorty Jun 13 '17 at 12:46
And did you try adding `--re-interval` as I suggested? What is the output of `gawk --version`? – Ed Morton Jun 13 '17 at 12:48
@EdMorton Sorry for the late reply. The --re-interval worked. Thank you. The version is GNU Awk 3.1.7. I guess I need to update it. – gitmorty Jun 13 '17 at 14:12
Yes you're about 5 years and multiple versions (including major releases) behind. – Ed Morton Jun 13 '17 at 14:16

regex to search pattern and output multiple lines until another pattern

1 Answers1