1

I have been trying to come up with a sed command that will pull certain lines from blocks of text separated by a blank line in a file. The blocks of text are as below.

# cat test_file.txt
line 1
line 2
line 3
line 4
line 5

line 1 
line 2
line 3
line 4
line 5

line 1 
line 2
line 3
line 4
line 5

I am trying to pull out line 2 an 4 from each block so the output will be like below.

line 2
line 4

line 2
line 4

line 2 
line 4

I came up with a way to do it for the first block of text using sed:

# sed -n -e 2p -e 4p test_flie.txt
line 2
line 4

But haven't been able to find a way to get it to continue for each block of text till the end of the file. Any pointers would be greatly appreciated.

oguz ismail
  • 1
  • 16
  • 47
  • 69
chrisdow38
  • 15
  • 5
  • sed is for simple substitutions on individual strings, that is all, for any other text manipulation the first tool to consider is awk. – Ed Morton Oct 26 '19 at 01:55

3 Answers3

1

awks paragraph mode exists specifically to handle blank-line separated records/blocks of text like you're dealing with:

$ awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"} {print $2, $4}' file
line 2
line 4

line 2
line 4

line 2
line 4

Reference the POSIX standard:

If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input

If you need to not have a blank line printed after the final record:

$ awk 'BEGIN{RS=""; FS=OFS="\n"} NR>1{print prev ORS} {prev=$2 OFS $4} END{print prev}' file
line 2
line 4

line 2
line 4

line 2
line 4

or if you don't want to use paragraph mode for some reason then:

$ awk 'BEGIN{tgts[2]; tgts[4]} !NF{print ""; lineNr=0; next} ++lineNr in tgts' file
line 2
line 4

line 2
line 4

line 2
line 4
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • This way there will be an extra blank line at the end of output, that's why I avoided null-RS – oguz ismail Oct 26 '19 at 08:36
  • 1
    In my experience whatever is generating an input file with blank lines like that is usually putting a blank line at the end of every block rather than just between blocks and even when it's not having a blank line at the end of every block in the output doesn't hurt and makes further processing easier because then you don't need to treat the last block differently from all the other blocks so it's usually a good thing to just generate a blank line after every block and rarely is it a bad thing. Makes the code vastly easier to do whatever you want with each block too. – Ed Morton Oct 26 '19 at 13:41
  • 1
    @oguzismail I updated my answer to show how to do it if you don't want a blank line after the final output record. – Ed Morton Oct 26 '19 at 13:50
0

I'd use awk for this, e.g:

awk '(!NF&&m=NR)||NR-m==2||NR-m==4' file
oguz ismail
  • 1
  • 16
  • 47
  • 69
0

This might work for you (GNU sed):

sed -n '/\S/{n;p;n;n;p;:a;n;//ba;p}' file

Set the -n option for explicit printing. Print the second and fourth lines then throw away any non-blank lines and print the first blank one. Repeat.

potong
  • 55,640
  • 6
  • 51
  • 83