0

I need to count the occurance of a multiline pattern of 3 lines in a htm file. The problem is that I have a fix content in line 1 and 3, however the content of line 2 is not fix, it can change (the file is a log). Here's an example of what I mean:

fix line 1
changing line 2
fix line 3

I have searched for solutions, but haven't found a 100% suitable one... pcregrep should work, but how do I include the changing line 2? So far I can only look for two fix lines. The code itself is the problem here, but the output is very easy to use for me.

pcregrep -Mc '^line1\n^line2\n^line3' file

Or should I use sed instead? The code works, but the output is complicated to use. How do I handle it to count the occurances of this multiline pattern? Because there has to be just one line between line 1 and 3, that's important.

sed -n '/^line1/,/^line3/=' file

I hope you can help me. Thank you very much!

McEdy
  • 3
  • 2
  • What would you want the output to be if you had interleaved blocks, e.g. such `blk1line1 \n blk2line1 \n blk1line3 \n blk2line2 \n blk2line3 \n` where blk1line1 and blk2line1 both match your `line1` RE and blk1line3 and blk2line3 both match your `line3` RE? – Ed Morton Feb 14 '15 at 17:05
  • I hope that I won't have to deal with interleaving. As far as I have seen I don't have it, but I could be wrong (as you might have noticed already, I am still leaning bash). I access the htm file via curl and give it to the filter with a pipe. Your and Avinash Raj's code worked perfectly. – McEdy Feb 14 '15 at 17:16

3 Answers3

1

You could use the below pcregrep command.

pcregrep -Mc '^line 1\n[^\n]*\nline 3' file

Example:

$ cat file
line 1
changing line 2
line 3
foo
bar
buz
line 1
changing line
line 3
foo
bar
buz
line 1
bar
line 3
$ pcregrep -Mc '^line 1\n[^\n]*\nline 3' file
3
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • The `.*` will match if there's more than one line between lines 1 and 3. It should work with `(?s)(^|\n)\Kline 1\n[^\n]*\nline 3`, though. – Wintermute Feb 14 '15 at 15:57
  • The problem is that it returns `2` when used on this text for instance: text line_1 text line_3 line_1 text text line_3 (Sorry, I can't post it with line breaks. The spaces in between represent the line breaks.) It has to be just one line in between of line 1 and 3, that is why it is so hard for me to do... – McEdy Feb 14 '15 at 16:13
  • It always returns `2` when I use `pcregrep -Mc '(?s)(^|\n)\Kline 1\n[^\n]*\nline 3' text.txt`. That's odd... When I make line 1 the first one of the whole file it works, but in reality that will never be the case... – McEdy Feb 14 '15 at 16:24
0

idk what pcregrep is and I don't have it on any of the UNIX boxes I use but you could just use awk since it's available on all UNIX installations, e.g. run against @AvinashRaj's sample input file and using GNU awk for multi-char RS:

$ awk -v RS='^$' '{print gsub(/(^|\n)line 1\n[^\n]*\nline 3\n/,"")}' file
3

or with any awk:

$ awk '{rec=rec $0 RS} END{print gsub(/(^|\n)line 1\n[^\n]*\nline 3\n/,"",rec)}' file
3

I added anchoring to the front+back of the RE to make it impossible to produce false matches.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

This might work for you (GNU sed & wc):

sed '1N;N;/fix line 1\n.*\nfix line 3/{x;s/^/\n/;x};$!D;x;s/.//p;d' file | wc -l

This creates a moving window of 3 lines throughout the file and appends a newline to the hold space when it encounters the desired pattern. At the end of the file the newlines are counted using wc (less the added newline that sed appends).

This solution will also cater for interleaved patterns as it specifically looks at all groups of 3 lines throughout the file.

potong
  • 55,640
  • 6
  • 51
  • 83