0

I want to exclude a word say "Dogma" in my regular expression inside an awk script such that all other lines except the one containing "Dogma" is captured in the regex. How can I do that ?

input

<animal>Because I am a seeker of truth, I do not accept every bit of dogma as fact</animal><animal>The dog kept barking all night</animal><animal>The mills of God grind slowly</animal>

my regex

reg = "<" animal ">" [(^Dogma)+]</" animal ">"

desired output

<animal>Because I am a seeker of truth, I do not accept every bit of dogma as fact</animal><animal>The d## kept barking all night</animal><animal>The mills of G## grind slowly</animal>

I am matching the line with the regex and if the line matches the regex defined above, it will substitute the desired word by extracting it and replacing with #. The logic is working fine for the other scenarios but not this one. As this regular expression is ignoring even the lines which has "Dogs" or "Gods" in it. How can I make regex ignore the word as a whole ? Any suggestion would be appreciated.

  • This has also been [discussed here](https://unix.stackexchange.com/questions/318839/awk-negative-regular-expression). If AWK supported PCRE, it could be done using negative lookahead. – MyICQ Mar 14 '22 at 06:42
  • added few more details. – Dave Johnson Mar 14 '22 at 07:05
  • Just to spell out the misunderstanding, `[^abc]` means "match _a single character_ which is not one of `a`, `b`, or `c`." There is no facility to directly say "match a string which isn't `abc`" in Awk regex. – tripleee Mar 14 '22 at 07:29

0 Answers0