0

I have a file that looks like this:

a: 0
a: 0
a: 0
a: 1
b: 1
c: 1
d: 1
e: 1
f: 1
a: 2
b: 2
c: 2
d: 2
e: 2
f: 2
a: 3
b: 3
c: 3
d: 3
e: 3
f: 3
c: 4
c: 4
c: 4

I want to capture and output all of the a and c lines of the form <a line><anything other than an a or c line><c line> so the output would look like:

a: 1
c: 1

a: 2
c: 2

a: 3
c: 3

Note that neither the a: 0 lines at the beginning nor the c: 4 lines at the end are captured because they don't follow the pattern I mentioned. Note also that the b lines between the a and c lines are removed.

I've been trying to do this with lookarounds usings Bash's pcregrep, but haven't found a solution yet. Any ideas?

Thanks!

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
gkeenley
  • 6,088
  • 8
  • 54
  • 129
  • 2
    why over-complicating with `pcre` , have you tried `grep -E '^[ac]'` ? – P.... May 30 '19 at 18:53
  • What did you try? What does it have to do with `bash`? Where did you find that `bash` provides `pcregrep`? Post your attempts made so far – Inian May 30 '19 at 18:54
  • Why are there blanks between two lines each in the output? – Benjamin W. May 30 '19 at 18:56
  • @PS. I edited my original post. I'd left out some info before. You're right that the regex you suggested solves what I'd originally written, so I upvoted. – gkeenley May 30 '19 at 19:02
  • @Inian I updated my original post, I had left some info out. Re bash, I'm writing pcregrep commands in the bash shell on OS X (terminal). An example of what I've tried so far is pcregrep -M '^a(?=(^b))^c' , where I'm trying to match a line that starts with 'a' that has a 'b' line ahead of it, and a 'c' line, and to include the 'a' and 'c' lines only. – gkeenley May 30 '19 at 19:08
  • You probably want an awk solution, multi-line with grep gets messy fast. – Benjamin W. May 30 '19 at 19:12

1 Answers1

4

Using awk

Try:

$ awk -F: '$1=="a"{aline=$0} $1=="c"{if(aline)print aline ORS $0 ORS; aline=""}' file
a: 1
c: 1

a: 2
c: 2

a: 3
c: 3

How it works

By default, awk reads in one line at a time.

  • -F:

    This tells awk to use : as the field separator.

  • $1=="a"{aline=$0}

    Everytime an a line is observed, save the line in variable aline.

  • $1=="c"{if(aline)print aline ORS $0 ORS; aline=""}

    Every time a c line is observed, check to see if we have a nonempty aline. If so, print aline and the current line, separated by newline characters. Also, set aline back to an empty string.

Multiline version

For those who prefer their commands spread over several lines:

awk -F: '
    $1=="a"{
        aline=$0
    }

   $1=="c"{
        if(aline)
            print aline ORS $0 ORS
        aline=""
    }' file

Using sed

$ sed -n '/^a/h; /^c/{x;/^a/{p;x;s/$/\n/;p};h}' file
a: 1
c: 1

a: 2
c: 2

a: 3
c: 3

How it works

  • -n

    This tells sed not to print anything unless we explicitly ask it to.

  • /^a/h

    Any time we have a line that starts with a, we save it to the hold space.

  • /^c/{ x; /^a/{ p; x; s/$/\n/; p}; h}

    Any time we have a line that starts with c, we:

    • We swap (x) the pattern space with the hold space.

    • If the new pattern space starts with a, then we print (p) it, and swap (x) again, add a new line to the end of the new pattern space (s/$/\n/) and print (p) it.

    • Lastly we save the current pattern space (which starts with a c) to the hold space.

John1024
  • 109,961
  • 14
  • 137
  • 171