4

I've long list of generated reports which I want to filter. The report is something like this:

Report Name
Report Date
Blah blah blah
Blah: WORD1
Blah blah
blah blah: WORD2
blah blah

I'm trying to use ag (PCRE regex) or rg (rust regex) and find all files which contains WORD1 AND WORD2 in different places of the file (contains new line).

I've already search SX and found these which didn't work:

> ag (?=.*WORD1)(?=.*WORD2)

> ag (?=.*WORD1)((.|\n)*)(?=.*WORD2)

UPDATE

As @WiktorStribiżew pointed out, the ag uses PCRE. Sorry for the mistake.

my expected output is:

blah blah: WORD2

or just the list of matched files.


p.s. currently I've managed to using this:

> ag "WORD2" $(ag -l "WORD1")
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
SddS
  • 587
  • 1
  • 5
  • 17

4 Answers4

4

You may use a PCRE pattern with ag:

(?s)^(?=.*WORD1)(?=.*WORD2).*\n\K(?-s).*WORD2

See the regex demo.

Details:

  • (?s) - a DOTALL modifier ON (. matches line break chars)
  • ^ - start of string
  • (?=.*WORD1) - there must be WORD1 somewhere in the string
  • (?=.*WORD2) - there must be WORD2 somewhere in the string
  • .* - any 0+ chars, as many as possible, up to the last occurrence of the subsequent subpatterns (if you use a lazy *? quantifier, .*? will match 0+ chars as few as possible up to the first occurrence of the subsequent subpatterns)
  • \n - a newline
  • \K - match reset operator discarding the currently matched text
  • (?-s) - DOTALL mode disabled (. does not match line breaks)
  • .*WORD2 - any 0+ chars other than line break chars, as many as possible, and then WORD2.
Graham
  • 7,431
  • 18
  • 59
  • 84
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • See [`(?s)(?:\G(?!\A)|^(?=.*WORD1)(?=.*WORD2)).*?\n\K(?-s).*WORD2` modification](https://regex101.com/r/wdEQ2t/1) if you need to get multiple matches with this approach. – Wiktor Stribiżew Jun 26 '17 at 06:56
  • Thanks, I extend it to three words with this [`^(?=[\s\S]*WORD1)(?=[\s\S]*WORD3)(?:[\s\S]*\n)?\K.*WORD2`](https://regex101.com/r/18BiHE/3) – SddS Jun 26 '17 at 07:17
  • 1
    Ok, but `[\s\S]` is not really necessary when you can control `.` behavior with the inline modifiers. It is not JavaScript or Python, where you have no such option. – Wiktor Stribiżew Jun 26 '17 at 07:26
2

The question mentions this pattern, which works:

ag "WORD2" $(ag -l "WORD1")

But only WORD2 will be highlighted in color. I prefer:

ag 'WORD1|WORD2' --passthru -C3 $(ag -l "WORD1" $(ag -l "WORD2"))

This gives three lines on either side of the matches and highlights both WORD1 and WORD2.

MatrixManAtYrService
  • 8,023
  • 1
  • 50
  • 61
1
function agmw() {
  args=("$@")
  qs="ag -l  $1"
  for i in {2..$#}; do
    qs="$qs | xargs -r ag -l '${args[$i]}'"
  done
  argarr="$1"
  for i in {2..$#}; do
    argarr="$argarr|${args[$i]}"
  done
  qs="$qs | xargs -r ag '$argarr'"
  echo $qs
  ag '$argarr'
  bash -c $qs
}

agmw hello world #seacrh hello and world across all files
walkman
  • 1,743
  • 2
  • 10
  • 10
0

p.s. currently I've managed to using this: ag "WORD2" $(ag -l "WORD1")

That's certainly the easiest way to do it. The tools you're talking about are inherently line-oriented, and you're looking to match different lines in the same file.

If you use ack, it has the -x operator which lets you do ack -l WORD1 | ack -x WORD2 which is basically the same thing as ack -l WORD1 | xargs ack WORD2 without having to introduce xargs into the pipeline.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • As far as I know the problem with Ack is it doesn't support regexes on multiline cases. For example in your solution I want something like this: `ack -l "regexPatternForMultiLIne" | ack -x "ShowThisValue:"` – SddS Jun 27 '17 at 06:51