5

Hi i am looking for an awk that can find two patterns and print the data between them to a file only if in the middle there is a third patterns in the middle. for example:

Start
1
2
middle
3
End
Start
1
2
End

And the output will be:
Start
1
2
middle
3
End

I found in the web awk '/patterns1/, /patterns2/' path > text.txt but i need only output with the third patterns in the middle.

Ggdw
  • 2,509
  • 5
  • 24
  • 22
  • Fiddly, but doable. You'll need to save the material between Start and End, and when you come across Middle, note that the saved material should be printed, and as you process End, see whether the saved material should be printed. I've not got the time to reduce it to code now. (Save each `$0` in an array after you recognize Start; stop saving on End, printing the array if appropriate and clearing the array regardless.) – Jonathan Leffler Aug 19 '13 at 17:41
  • Also, can there be any lines of data not between Start and End? Or is it always a sequence of Start..End lines, but only some of them need to be printed. – Jonathan Leffler Aug 19 '13 at 17:51
  • can be empty, but i nees only the one with middle pattern – Ggdw Aug 19 '13 at 17:56
  • So the file could contain: `Start`, `1`, `middle`, `2`, `End`, `Junk`, ``, `Start`, `3`, `4`, `End`? And the `Junk` and `` should not be included in the output? Only the first 5 lines should be echoed? – Jonathan Leffler Aug 19 '13 at 17:57
  • junk or blank line could be inside start and end provided with the middle – Ggdw Aug 19 '13 at 18:18
  • I think you are saying that the first line in the file will be a Start line; the last line in the file will be an End line; and every intermediate End line will be immediately followed by a Start line. So the solutions using 'read paragraphs' based on `RS="End"` will work OK for your data. – Jonathan Leffler Aug 19 '13 at 18:21
  • Never use `awk '/patterns1/, /patterns2/' path`. It makes the trivial application negligibly briefer to write and anything else much harder to write. – Ed Morton Aug 19 '13 at 19:16

5 Answers5

4

And here is a solution without flags:

$ awk 'BEGIN{RS="End"}/middle/{printf "%s", $0; print RT}'  file
Start
1
2
middle
3
End

Explanation: The RS variable is the record separator, so we set it to "End", so that each Record is separated by "End".

Then we filter the Records that contain "middle", with the /middle/ filter, and for the matched records we print the current record with $0 and the separator with print RT

user000001
  • 32,226
  • 12
  • 81
  • 108
  • Interesting...but I think it warrants some explanation of how it works. – Jonathan Leffler Aug 19 '13 at 17:47
  • 1
    This doesn't take into account `Start` it just prints records containing both `middle` and `End`. You are also adding in an extra newline after record. – Chris Seymour Aug 19 '13 at 17:48
  • @JonathanLeffler added an explanation – user000001 Aug 19 '13 at 17:50
  • 2
    See my auxilliary question to the OP in the question-level comments. This works nicely if there's never anything other than Start..End sequences; not so hot if there's other data that has to be removed. Good idea if the data allows it to work, though. – Jonathan Leffler Aug 19 '13 at 17:53
3

This awk should work:

awk '$1=="Start"{ok++} ok>0{a[b++]=$0} $1=="middle"{ok++} $1=="End"{if(ok>1) for(i=0; i<length(a); i++) print a[i]; ok=0;b=0;delete a}' file

Start
1
2
middle
3
End

Expanded:

awk '$1 == "Start" {
   ok++
}
ok > 0 {
   a[b++] = $0
}
$1 == "middle" {
   ok++
}
$1 == "End" {
   if (ok > 1)
      for (i=0; i<length(a); i++)
         print a[i];
   ok=0;
   b=0;
   delete a
}' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
3

Just use some flags with awk:

/Start/ {
    start_flag=1
}

/middle/ {
    mid_flag=1
}

start_flag {
    n=NR;
    lines[NR]=$0
}

/End/ {
    if (start_flag && mid_flag)
        for(i=n;i<NR;i++)
            print lines[i]
    start_flag=mid_flag=0
    delete lines
}
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
3

Modified the awk user000001

awk '/middle/{printf "%s%s\n",$0,RT}' RS="End" file

EDIT: Added test for Start tag

awk '/Start/ && /middle/{printf "%s%s\n",$0,RT}' RS="End" file
Jotne
  • 40,548
  • 12
  • 51
  • 55
2

This will work with any modern awk:

awk '/Start/{f=1;rec=""} f{rec=rec $0 ORS} /End/{if (rec~/middle/) printf "%s",rec}' file

The solutions that set RS to "End" are gawk-specific, which may be fine but it's definitely worth mentioning.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185