1

I have a file that has the following pattern.

A
.
.
XYZ
.
.
A
.
.
A
.
.
A
.
.
XYZ

where "."s are new lines with random words (not A or XYZ).

I want to count all the occurrences of patterns that match

A
.  (any number of lines)
XYZ

I also want to count it only when A is followed by XYZ and not when A is followed by another A.

I tried

pcregrep -Mc 'A.*(\n?|.)*?XYZ' file.txt

but it fails with

> pcregrep: pcre_exec() gave error -27 while matching text that starts:

Desired output for the above input: 2

Does anyone have any idea how to do this?

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
Pranav
  • 11
  • 2
  • 3
    Please edit your question and add your desired output (no description) for that sample input. – Cyrus Apr 06 '23 at 01:35

5 Answers5

0

You could use perl:

perl -ne "/A/../XYZ/ && ($last += 1);END {print $last};" file.txt
2
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

You can use a non-greedy quantifier to match closest pairs of A and XYZs:

pcregrep -Mc '^A$[.\n]*?^XYZ$' file.txt

Demo: https://regex101.com/r/ETVkuh/3

blhsing
  • 91,368
  • 6
  • 71
  • 106
0

With ripgrep:

$ rg -UPc '(?s)^A$((?!^A$).)*?^XYZ$' ip.txt
2

The -U option enables multiline search. -P is for PCRE (since we need lookarounds) and -c is to get the count. (?s) enables . to match newline characters. ((?!^A$).)*? is a negated group to prevent matching A as a whole line.

If you don't care about line anchors and A is always a single character, you can simplify the command as follows:

$ rg -Uc 'A[^A]*?XYZ' ip.txt
2
Sundeep
  • 23,246
  • 2
  • 28
  • 103
0

You can use:

/^A$(?:(?!^A$)[\s\S])*?^XYZ$/gm

Demo

The easiest way to use that is with perl:

perl -0777 -nE '$cnt=()=/^A$(?:(?!^A$)[\s\S])*?^XYZ$/gm; say $cnt;' file

Or if you don't want to gulp the file, use a range regex:

perl -nE '/^A$(?:(?!^A$)[\s\S])*?/m../^XYZ$/m && $cnt++; END{say $cnt}' file

Or if you want to be even more obtuse and perly:

perl -nE '/^A$((?!^A$)[\s\S])*?/m../^XYZ$/m && $cnt++}{ say $cnt' file
dawg
  • 98,345
  • 23
  • 131
  • 206
0

Using any awk in any shell on every Unix box. no matter which characters your start and end strings contain and no matter whether or not they also exist as substrings at the start of other lines:

$ awk '$0=="A"{ f= 1} f{ if ($0=="XYZ") { cnt++; f=0 } } END{ print cnt+0 }' file
2
Ed Morton
  • 188,023
  • 17
  • 78
  • 185