1

Suppose I have an input file with lines of text:

line 1
line 2
line 3
line 4
line 2

now suppose I would like to check if my inputfile contains

line 2
line 3

and remove that block of text if it is found. This would give:

line 1
line 4
line 2

Note that I don't want to remove just every occurrence of line 2 or line 3; but only if they are found one after another. (In reality I want to check for a block of 5 lines, and not just any block of code between two placeholders, but let's keep the example simple).

I looked into awk but that is getting complicated very quick (I'm not yet ready with this; since I feel this is not the right approach and will explode with 5 lines...)

awk '/line 2/ {if (line0) {print line0; line0=""}; line0=$0}' input.txt
Chris Maes
  • 35,025
  • 12
  • 111
  • 136
  • 1
    That's not a stellar duplicate; I'm still looking for a better one. This is certainly a FAQ but it's hard to find a good collection of actually correct answers. Anyway, I'd go for the `perl -0777` one-liner unless the input file is huge. – tripleee Sep 03 '18 at 15:45
  • see https://stackoverflow.com/questions/20961661/how-to-remove-lines-above-and-below-an-inverse-grep-match it can be adapted to your use case. – Red Cricket Sep 03 '18 at 16:00
  • 3
    Neither the question this was closed as a dup of, nor the suggested alternative contains reasonable answers to this question so I'm going to re-open it (at least until a better dup can be found). – Ed Morton Sep 03 '18 at 16:09
  • Please add your desired output for that sample input to your question. – Cyrus Sep 03 '18 at 16:21
  • @cyrus: thanks. Done. – Chris Maes Sep 03 '18 at 16:22

4 Answers4

3

One way with GNU awk for multi-char RS and RT:

$ awk -v RS='(^|\n)line 2\nline 3\n' '{ORS=(RT ~ /^\n/ ? "\n" : "")} 1' file
line 1
line 4
line 2

With any awk:

$ cat file
line 2
line 3
line 1
line 2
line 3
line 4
line 2
line 3

$ awk '
    { rec = rec $0 RS }
    END {
        rec = RS rec
        gsub(/\nline 2\nline 3\n/,RS,rec)
        gsub(/^\n|\n$/,"",rec)
        print rec
    }
' file
line 1
line 4

The above assumes you want to match using regexps since that's what your posted code does. If you want to do literal string matches instead that's do-able too with some massaging:

$ cat tst.awk
{ rec = rec $0 RS }
END {
    while ( beg = index(RS rec,RS block RS) ) {
        out = out substr(RS rec,1,beg-1)
        rec = substr(RS rec,beg+length(block)+2)
    }
    print substr(out rec,2)
}

$ awk -v block='line 2\nline 3' -f tst.awk file
line 1
line 4
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • What does the `1` mean before `file` in the first block? – dibery Sep 03 '18 at 16:49
  • 1
    `1` is a true condition so when it's encountered awk invokes the default action which is to print the current record. It's equivalent to and idiomatic shorthand for writing `{print $0}`. – Ed Morton Sep 03 '18 at 16:50
  • 1
    @EdMorton +UV for the robustness of this answer, and removed mine due to it's low quality in contrast and the points mentioned in comments. One thing though, do you think you could add an alternative to match a literal block with embedded line break ? I.e. if multiple lines were loaded into a variable.? – hmedia1 Sep 03 '18 at 17:16
  • That'd just be `block='(^|\n)line 2\nline 3\n'; awk -v RS="$block" '{ORS=(RT ~ /^\n/ ? "\n" : "")} 1' file` and similar for the non-gawk version. – Ed Morton Sep 03 '18 at 17:22
  • @αғsнιη fixed and I added some better sample input to test with so it covers the start/end of file cases too. – Ed Morton Sep 03 '18 at 17:53
  • this is probably a minor change; but your code only removes the first match? Anyways it is a pity that this becomes so complicated. – Chris Maes Sep 04 '18 at 06:49
  • As you can see from my answer - no, it does not only remove the first match. – Ed Morton Sep 04 '18 at 13:30
  • wrt complicated - the 2nd script is far from complicated and will work using any awk in any shell on any UNIX system, unlike every other answer you got, and the 3rd one is the only answer you got that works on literal strings instead of regexps. – Ed Morton Sep 04 '18 at 13:40
1

With gnu sed

sed -z 's/line 2\nline 3\n//g;s/line 2\nline 3\n$//' infile
Chris Maes
  • 35,025
  • 12
  • 111
  • 136
ctac_
  • 2,413
  • 2
  • 7
  • 17
1

Not awk, but this is straightforward with Perl 5, as @triplee pointed out. With the five-line input file you showed above as foo.txt:

perl -0777 -pe 's{^line 2\nline 3\n}{}gm' foo.txt

produces the desired three-line output.

Explanation:

  • -0777 causes perl to read the entire input as one string (see perlrun).
  • The /m modifier on the regex causes ^ to match at the beginning of a line (see perlre).
  • Edit ^ will also match at the beginning of the file, so you can detect blocks of lines even if there is not a newline before them.
  • The separators between the lines are literal \ns because $ matches before the \n with the /m modifier. Therefore, it's easier just to match the \n.

Thanks to this U&L SE answer by Stéphane Chazelas for the basics.

cxw
  • 16,685
  • 2
  • 45
  • 81
  • 1
    I'm really not a fan of perl; but here it beats the competition... at least it is almost understandable :) – Chris Maes Sep 04 '18 at 06:46
  • 1
    @EdMorton Good catch. I have removed the second option. The first option does match at the beginning of the file; I just tried it. Perl v5.26.1. Thanks! – cxw Sep 04 '18 at 13:55
0

This might work for you (GNU sed):

sed '/^line 2$/!b;N;/^line 3$/Md;P;D' file

If a line does not match the string line 2, print it and begin the next cycle. Otherwise, append the following line and if that does match the string line 3, delete both lines. Otherwise, print then delete the first line and repeat.

potong
  • 55,640
  • 6
  • 51
  • 83