how to match all files containing word1 AND word2 across different lines with ag or rg (PCRE/Rust regex)

Question

I've long list of generated reports which I want to filter. The report is something like this:

Report Name
Report Date
Blah blah blah
Blah: WORD1
Blah blah
blah blah: WORD2
blah blah

I'm trying to use ag (PCRE regex) or rg (rust regex) and find all files which contains WORD1 AND WORD2 in different places of the file (contains new line).

I've already search SX and found these which didn't work:

> ag (?=.*WORD1)(?=.*WORD2)

> ag (?=.*WORD1)((.|\n)*)(?=.*WORD2)

UPDATE

As @WiktorStribiżew pointed out, the ag uses PCRE. Sorry for the mistake.

my expected output is:

blah blah: WORD2

or just the list of matched files.

p.s. currently I've managed to using this:

> ag "WORD2" $(ag -l "WORD1")

@anubhava One drawback of that is that it assumes an order between the two words. Not that this may actually matter for the OP. — Tim Biegeleisen, Jun 26 '17 at 06:11
Try [`(?s)^(?=.*WORD1)(?=.*WORD2).*\n\K(?-s).*WORD2`](https://regex101.com/r/KuNJ1c/1) with Ag. Note Ag uses PCRE, not Perl regex. — Wiktor Stribiżew, Jun 26 '17 at 06:24
@WiktorStribiżew, Thanks it works. Could you please make your comment an answer and write a bit of elaboration on its mechanism?. — SddS, Jun 26 '17 at 06:42
@anubhava, thanks but it isn't a real AND as Tim Biegeleisen pointed it out. — SddS, Jun 26 '17 at 06:43
ok in that case use: [`^(?=[\s\S]*WORD1)(?:[\s\S]*\n)?\K.*WORD2`](https://regex101.com/r/18BiHE/1) — anubhava, Jun 26 '17 at 06:47
@anubhava, thanks. I think the better answer is the one which could be easily extended to 3 and more matches. — SddS, Jun 26 '17 at 06:49
`3 or matches` You should update the question with more clarity. — anubhava, Jun 26 '17 at 06:50
Yes, you're right the question is originally set for two matches and yours and Wiktor's pretty do the job. But I think it would be good to generalize the case for more matches. I'll create another question. Thanks. — SddS, Jun 26 '17 at 07:00
This seems like an over-complication? `ag WORD2 | ag WORD1` or `rg WORD2 | rg WORD1` both work fine. — BurntSushi5, Jun 26 '17 at 13:16
@BurntSushi5, That doesn't work for me but `ag "ShowThis" $(ag -l "WORD1" $(ag -l "WORD2"))` works. — SddS, Jun 26 '17 at 21:32
`ag ShowThis | ag WORD1 | ag WORD2` will yield every line that contains `ShowThis` and `WORD1` and `WORD2`. — BurntSushi5, Jun 27 '17 at 01:04
@BurntSushi5, that's the problem. WORD1 and WORD2 and "showThis" all are different lines each. — SddS, Jun 27 '17 at 06:31

score 4 · Accepted Answer · edited Sep 24 '17 at 04:53

4

You may use a PCRE pattern with ag:

(?s)^(?=.*WORD1)(?=.*WORD2).*\n\K(?-s).*WORD2

See the regex demo.

Details:

(?s) - a DOTALL modifier ON (. matches line break chars)
^ - start of string
(?=.*WORD1) - there must be WORD1 somewhere in the string
(?=.*WORD2) - there must be WORD2 somewhere in the string
.* - any 0+ chars, as many as possible, up to the last occurrence of the subsequent subpatterns (if you use a lazy *? quantifier, .*? will match 0+ chars as few as possible up to the first occurrence of the subsequent subpatterns)
\n - a newline
\K - match reset operator discarding the currently matched text
(?-s) - DOTALL mode disabled (. does not match line breaks)
.*WORD2 - any 0+ chars other than line break chars, as many as possible, and then WORD2.

edited Sep 24 '17 at 04:53

Graham

7,431
18
59
84

answered Jun 26 '17 at 06:51

Wiktor Stribiżew

607,720
39
448
563

See [`(?s)(?:\G(?!\A)|^(?=.*WORD1)(?=.*WORD2)).*?\n\K(?-s).*WORD2` modification](https://regex101.com/r/wdEQ2t/1) if you need to get multiple matches with this approach. – Wiktor Stribiżew Jun 26 '17 at 06:56
Thanks, I extend it to three words with this [`^(?=[\s\S]*WORD1)(?=[\s\S]*WORD3)(?:[\s\S]*\n)?\K.*WORD2`](https://regex101.com/r/18BiHE/3) – SddS Jun 26 '17 at 07:17
1

Ok, but `[\s\S]` is not really necessary when you can control `.` behavior with the inline modifiers. It is not JavaScript or Python, where you have no such option. – Wiktor Stribiżew Jun 26 '17 at 07:26

score 2 · Answer 2 · answered Sep 23 '18 at 19:58

The question mentions this pattern, which works:

ag "WORD2" $(ag -l "WORD1")

But only WORD2 will be highlighted in color. I prefer:

ag 'WORD1|WORD2' --passthru -C3 $(ag -l "WORD1" $(ag -l "WORD2"))

This gives three lines on either side of the matches and highlights both WORD1 and WORD2.

walkman · Answer 3 · 2022-08-10T08:47:43.063

1

function agmw() {
  args=("$@")
  qs="ag -l  $1"
  for i in {2..$#}; do
    qs="$qs | xargs -r ag -l '${args[$i]}'"
  done
  argarr="$1"
  for i in {2..$#}; do
    argarr="$argarr|${args[$i]}"
  done
  qs="$qs | xargs -r ag '$argarr'"
  echo $qs
  ag '$argarr'
  bash -c $qs
}

agmw hello world #seacrh hello and world across all files

edited Aug 10 '22 at 08:47

answered Dec 16 '20 at 03:07

walkman

1,743
2
10
10

score 0 · Answer 4 · answered Jun 27 '17 at 03:21

0

p.s. currently I've managed to using this: ag "WORD2" $(ag -l "WORD1")

That's certainly the easiest way to do it. The tools you're talking about are inherently line-oriented, and you're looking to match different lines in the same file.

If you use ack, it has the -x operator which lets you do ack -l WORD1 | ack -x WORD2 which is basically the same thing as ack -l WORD1 | xargs ack WORD2 without having to introduce xargs into the pipeline.

answered Jun 27 '17 at 03:21

Andy Lester

91,102
13
100
152

As far as I know the problem with Ack is it doesn't support regexes on multiline cases. For example in your solution I want something like this: `ack -l "regexPatternForMultiLIne" | ack -x "ShowThisValue:"` – SddS Jun 27 '17 at 06:51

how to match all files containing word1 AND word2 across different lines with ag or rg (PCRE/Rust regex)

UPDATE

4 Answers4