0

I need to grep for a multi-line string that doesn't include one string, but does include others. This is what I'm searching for in some HTML files:

<not-this>
   <this> . . . </this>
</not-this>

In other words, I want to find files that contain <this> and </this> on the same line, but should not be surrounded by html tags <not-this> on the lines before and/or after. Here is some shorthand logic for what I want to do:

grep 'this' && '/this' && !('not-this')

I've seen answers with the following...

grep -Er -C 2 '.*this.*this.*' . | grep -Ev 'not-this'

...but this just erases the line(s) containing the "not" portion, and displays the other lines. What I'd like is for it to not pull those results at all if "not-this" is found within a line or two of "this".

Is there a way to accomplish this?

P.S. I'm using Ubuntu and gnome-terminal.

Kyle Challis
  • 981
  • 2
  • 12
  • 28

1 Answers1

2

It sounds like an awk script might work better here:

$ cat input.txt
<not-this>
   <this>BAD! DO NOT PRINT!</this>
</not-this>

<yes-this>
   <this>YES! PRINT ME!</this>
</yes-this>


$ cat not-this.awk
BEGIN {
  notThis=0
}

/<not-this>/        {notThis=1}
/<\/not-this>/      {notThis=0}
/<this>.*<\/this>/  {if (notThis==0) print}

$ awk -f not-this.awk input.txt
   <this>YES! PRINT ME!</this>

Or, if you'd prefer, you can squeeze this awk script onto one long line:

$ awk 'BEGIN {notThis=0} /<not-this>/ {notThis=1} /<\/not-this>/ {notThis=0} /<this>.*<\/this>/ {if (notThis==0) print}' input.txt
Mike Holt
  • 4,452
  • 1
  • 17
  • 24
  • 2
    You can shorten this some. There are no need to assign `0` to a variable, and you can simplify the test like this: `awk '// {notThis=1} /<\/not-this>/ {notThis=0} /.*<\/this>/ && !notThis' file` – Jotne Apr 01 '14 at 05:25
  • 2
    @Jotne Thanks! I'm always interested in learning better ways to do things. I must admit I don't use `awk` enough to know more than the bare basics. – Mike Holt Apr 01 '14 at 07:06