How do I count multi line recurring pattern in a file?

Question

I have a file that has the following pattern.

A
.
.
XYZ
.
.
A
.
.
A
.
.
A
.
.
XYZ

where "."s are new lines with random words (not A or XYZ).

I want to count all the occurrences of patterns that match

A
.  (any number of lines)
XYZ

I also want to count it only when A is followed by XYZ and not when A is followed by another A.

I tried

pcregrep -Mc 'A.*(\n?|.)*?XYZ' file.txt

but it fails with

> pcregrep: pcre_exec() gave error -27 while matching text that starts:

Desired output for the above input: 2

Does anyone have any idea how to do this?

Please edit your question and add your desired output (no description) for that sample input. — Cyrus, Apr 06 '23 at 01:35

Onyambu · Answer 1 · 2023-04-06T02:21:59.183

0

You could use perl:

perl -ne "/A/../XYZ/ && ($last += 1);END {print $last};" file.txt
2

edited Apr 06 '23 at 02:21

answered Apr 06 '23 at 02:16

Onyambu

67,392
3
24
53

score 0 · Answer 2 · answered Apr 06 '23 at 02:17

0

You can use a non-greedy quantifier to match closest pairs of A and XYZs:

pcregrep -Mc '^A$[.\n]*?^XYZ$' file.txt

Demo: https://regex101.com/r/ETVkuh/3

answered Apr 06 '23 at 02:17

blhsing

91,368
6
71
106

score 0 · Answer 3 · answered Apr 06 '23 at 08:44

With ripgrep:

$ rg -UPc '(?s)^A$((?!^A$).)*?^XYZ$' ip.txt
2

The -U option enables multiline search. -P is for PCRE (since we need lookarounds) and -c is to get the count. (?s) enables . to match newline characters. ((?!^A$).)*? is a negated group to prevent matching A as a whole line.

If you don't care about line anchors and A is always a single character, you can simplify the command as follows:

$ rg -Uc 'A[^A]*?XYZ' ip.txt
2

dawg · Answer 4 · 2023-04-06T14:39:13.023

You can use:

/^A$(?:(?!^A$)[\s\S])*?^XYZ$/gm

Demo

The easiest way to use that is with perl:

perl -0777 -nE '$cnt=()=/^A$(?:(?!^A$)[\s\S])*?^XYZ$/gm; say $cnt;' file

Or if you don't want to gulp the file, use a range regex:

perl -nE '/^A$(?:(?!^A$)[\s\S])*?/m../^XYZ$/m && $cnt++; END{say $cnt}' file

Or if you want to be even more obtuse and perly:

perl -nE '/^A$((?!^A$)[\s\S])*?/m../^XYZ$/m && $cnt++}{ say $cnt' file

score 0 · Answer 5 · answered Apr 07 '23 at 20:04

Using any awk in any shell on every Unix box. no matter which characters your start and end strings contain and no matter whether or not they also exist as substrings at the start of other lines:

$ awk '$0=="A"{ f= 1} f{ if ($0=="XYZ") { cnt++; f=0 } } END{ print cnt+0 }' file
2

How do I count multi line recurring pattern in a file?

5 Answers5