3

I was using the less command to browse a very huge text log file (15 GB) and was trying to search for a multiline pattern but after some investigation, less command can only search single line patterns.

Is there a way to use grep or other commands to return the number line of a multiline pattern?

The format of the log is something like this in iterations of hundred thousands:

Packet A
op_3b       : 001
ctrl_2b     : 01
ini_count   : 5

Packet F
op_3b       : 101
ctrl_2b     : 00
ini_count   : 4

Packet X
op_3b       : 010
ctrl_2b     : 11
ini_count   : 98

Packet CA
op_3b       : 100
ctrl_2b     : 01
ini_count   : 5

Packet LP
op_3b       : 001
ctrl_2b     : 00
ini_count   : 0

Packet ZZ
op_3b       : 111
ctrl_2b     : 01
ini_count   : 545

Packet QEA
op_3b       : 111
ctrl_2b     : 11
ini_count   : 0

And what I am trying to get is to have grep or some other command to return the start of the line number of when these three line pattern occurs:

op_3b       : 001
ctrl_2b     : 00
ini_count   : 0
seven-phases-max
  • 11,765
  • 1
  • 45
  • 57
AlphonseVA
  • 85
  • 1
  • 6

4 Answers4

5

Suppose that pattern is in file pattern like this:

$ cat pattern
op_3b       : 001
ctrl_2b     : 00
ini_count   : 0

Then, try:

$ awk '$0 ~ pat' RS=  pat="$(cat pattern)" logfile
Packet LP
op_3b       : 001
ctrl_2b     : 00
ini_count   : 0

How it works

  • RS=

    This sets the Record Separator RS to an empty string. This tells awk to use an empty line as the record separator.

  • pat="$(cat pattern)"

    This tells awk to create an awk variable pat which contains the contents of the file pattern.

    If your shell is bash, then a slightly more efficient form of this command would be pat="$(<pattern)". (Don't use this unless you are sure that your shell is bash.)

  • $0 ~ pat

    This tells awk to print any record that matches the pattern.

    $0 is the contents of the current record. ~ tells awk to do a match between the text in $0 and the regular expression in pat.

    (If the contents of pattern had any regex active characters, we would need to escape them. Your current example does not have any so this is not a problem.)

Alternative style

Some people prefer a different style for defining awk variables:

$ awk -v RS=  -v pat="$(cat pattern)" '$0 ~ pat' logfile
Packet LP
op_3b       : 001
ctrl_2b     : 00
ini_count   : 0

This works the same.

Displaying line numbers

$ awk -F'\n' '$0 ~ pat{print "Line Number="n+1; print "Packet" $0} {n=n+NF-1}' RS='Packet'  pat="$(cat pattern)" logfile
Line Number=20
Packet LP
op_3b       : 001
ctrl_2b     : 00
ini_count   : 0
John1024
  • 109,961
  • 14
  • 137
  • 171
  • Thank you. Yours also works quite well and is, if it is important, less format sensitive than mine.. – John1024 Apr 16 '19 at 05:23
  • I have tried it and it works as intended. Thanks. Is it possible to have `awk` return the line number of which the patten had occur? – AlphonseVA Apr 16 '19 at 05:35
  • 1
    This will get you the line number: `'++i && $0 ~ pat { print $0"\n\n"i*5-4 }'` – Rafael Apr 16 '19 at 05:51
  • 1
    @AlphonsevonAlexandric I just added a version to the answer that keeps track of line numbers. Rafael's version in the comment above, which assumes 5 lines per entry, also works. – John1024 Apr 16 '19 at 06:00
  • You should mention that the version to get line numbers requires GNU awk for multi-char RS. All versions will interpret escape sequences so for example `\t` in `pattern` will be converted to a literal tab in `pat` for matching against the input file and so wouldn't match if `\t` existed in the input. They'll also interpret RE metachars as such so `.` will match any char, etc. – Ed Morton Apr 16 '19 at 15:27
2

Here's my scant attempt:

awk -v RS="" -v FS="\n" -v op=001 -v ctrl=00 -v ini=0 '$2~op&&$3~ctrl&&$4~ini' data.txt
Rafael
  • 7,605
  • 13
  • 31
  • 46
1

The best approach so far is the one from John1024 using awk as you can do it in one pass, if you do really want to go for a grep solution, you can use:

$ grep -m 1 -zoP 'Packet\s*[^\s]*\s*(?=op_3b\s*:\s*001\s*ctrl_2b\s*:\s*00\sini_count\s*:\s*0)' file
Packet LP

Notes:

  • -m 1 will make grep return after the first match, you can remove it if your pattern appears several times.
  • -z allows multi-lines patterns as it enables the ASCII NUL char instead of the normal EOL
  • -o to just display the result match as output and not the whole file
  • -P to activate perl regex

If you want to have the line number(s):

grep -n -f <(grep -m 1 -zoP 'Packet\s*[^\s]*\s*(?=op_3b\s*:\s*001\s*ctrl_2b\s*:\s*00\sini_count\s*:\s*0)' file) file
21:Packet LP

However you need to do 2 passes, so on a 15GB file awk is the best approach.

Allan
  • 12,117
  • 3
  • 27
  • 51
  • I didn't realize grep had a multiline flag (I've always used sed's `N`). That's cool! – Rafael Apr 16 '19 at 08:29
  • @Rafael: Yeah it is a great option, but as far as I know it is only available for GNU grep: https://www.gnu.org/software/grep/manual/html_node/Output-Line-Prefix-Control.html – Allan Apr 16 '19 at 08:31
0

if your data in 'd' file try:

grep -nEA2 '^op_3b\s*:\s*001' d

edit number 001 above as your key search