Using a single sed invocation to head the first H lines and tail the last T lines

Question

I wrote a C program awhile back to summarize a text file by doing both a head and tail at the same time, with only a single readthrough of piped input. Example:

$ headtail -h 3 -t 3 < /tmp/x10
line01
line02
line03
... 4 output lines omitted ...
line08
line09
line10

It works, but I feel dirty by not having a nifty sed alias that can do this. Having found this SO answer that uses sed to print the last N lines, it seems achievable now, but I'm not quite there.

For example, the individual head and tail work:

$ sed -n -e '1,3p' < /tmp/x10
line01
line02
line03

$ sed -n -e ':a; $p; N; 4,$D; ba' < /tmp/x10
line08
line09
line10

But my attempt at combining the two fails:

$ sed -n -e '1,3p; :a; $p; N; 4,$D; ba' < /tmp/x10
line01
line08
line09
line10

It'd also be nice for it to work if H+T > N lines in the file (act like cat), and also for it to print a separator indicating that some lines were omitted from the middle (the number omitted would be nice, but I could live without it).

John1024 · Answer 1 · 2019-02-22T22:45:54.980

Try:

$ seq 10 | sed -n -e '1,3{p;b}; :a; $p; N; 7,$D; ba'
1
2
3
8
9
10

(The 7 comes from adding together 3 (head) plus 3 (tail) plus 1.)

If we increase the tail from 3 to 7, we get the whole file:

$ seq 10 | sed -n -e '1,3{p;b}; :a; $p; N; 12,$D; ba'
1
2
3
4
5
6
7
8
9
10

(12 is 3 (head) plus 7 (tail ) plus 1.)

How it works

1,3{p;b}

For any of the first three lines, we print them (p) and then branch (b) past the rest of the commands in the code.
:a; $p; N; 7,$D; ba

This works the same as before except that these lines never see the first three lines. Consequently, we have to change the starting point for the D command to 7.

Ed Morton · Accepted Answer · 2019-02-23T06:34:57.670

There's no need for C programs or convoluted sed scripts, all you need is a clear, simple, portable, efficient awk script:

$ seq 10 | awk -v h=3 -v t=3 'NR<=h; {a[NR%t]=$0} END{for (i=1; i<=t; i++) print a[(NR+i)%t]}'
1
2
3
8
9
10

$ seq 10 | awk -v h=3 -v t=3 'NR<=h; {a[NR%t]=$0} END{print "skipped", NR-(t+h); for (i=1; i<=t; i++) print a[(NR+i)%t]}'
1
2
3
skipped 4
8
9
10

You didn't say what your requirements are if the ranges overlap so I'm just including overlapping lines in both output sections and printing a negative value for skipped, e.g.:

$ seq 10 | awk -v h=7 -v t=5 'NR<=h; {a[NR%t]=$0} END{print "skipped", NR-(t+h); for (i=1; i<=t; i++) print a[(NR+i)%t]}'
1
2
3
4
5
6
7
skipped -2
6
7
8
9
10

but whatever your requirements are for edge cases they'd be trivial to implement.

potong · Answer 3 · 2019-02-23T17:59:14.990

This might work for you (GNU sed):

sed -E '1,5p;H;$!d;x;s/.*((\n[^\n]*){3})$/\1/;s/./==========&/' file

This prints the first five and last three lines separated by ==========.

The commands use a range for the first n lines and all lines are stored in the hold space. At the end of the file the hold space is reduced to the required number of lines and the leading newline replaced by the separator.

Another solution, less memory intensive but restricted to the heading lines being equal to or less than the tailing lines is:

sed ':a;$!{N;;s/[^\n]\+/&/5;3{p;x;s/^/==========/p;x};Ta};$P;D' file

Here the first three and the last five lines are printed with a separator.

Using a single sed invocation to head the first H lines and tail the last T lines

3 Answers3

How it works