Print first few and last few lines of file through a pipe with "..." in the middle

Question

Problem Description

This is my file

I would like to send the cat output of this file through a pipe and receive this

% cat file | some_command
1
2
...
9
10

Attempted solutions

Here are some solutions I've tried, with their output

% cat temp | (head -n2 && echo '...' && tail -n2)
1
2
...

% cat temp | tee >(head -n3) >(tail -n3) >/dev/null
1
2
3
8
9
10
# I don't know how to get the ...

% cat temp | sed -e 1b -e '$!d'
1
10

% cat temp | awk 'NR==1;END{print}'
1
10
# Can only get 2 lines

dawg · Accepted Answer · 2021-12-08T14:22:34.333

8

An awk:

awk -v head=2 -v tail=2 'FNR==NR && FNR<=head
FNR==NR && cnt++==head {print "..."}
NR>FNR && FNR>(cnt-tail)' file file

Or if a single pass is important (and memory allows), you can use perl:

perl -0777 -lanE 'BEGIN{$head=2; $tail=2;}
END{say join("\n", @F[0..$head-1],("..."),@F[-$tail..-1]);}' file

Or, an awk that is one pass:

awk -v head=2 -v tail=2 'FNR<=head
{lines[FNR]=$0}
END{
    print "..."
    for (i=FNR-tail+1; i<=FNR; i++) print lines[i]
}' file

Or, nothing wrong with being a caveman direct like:

head -2 file; echo "..."; tail -2 file

Any of these prints:

1
2
...
9
10

It terms of efficiency, here are some stats.

For small files (ie, less than 10 MB or so) all these are less than 1 second and the 'caveman' approach is 2 ms.

I then created a 1.1 GB file with seq 99999999 >file

The two pass awk: 50 secs
One pass perl: 10 seconds
One pass awk: 29 seconds
'Caveman': 2 MS

edited Dec 08 '21 at 14:22

answered Dec 07 '21 at 21:13

dawg

98,345
23
131
206

2

Now handle cases where lines count is less than head and tail, and case when head and tail lines intersects ^^ – Léa Gris Dec 07 '21 at 22:42
They all handle overlapping head and tail. – dawg Dec 08 '21 at 01:20
1

Especially with large files, the "caveman" approach is the best, because it's the only one that won't read the whole file (`head` stops after a few lines, and `tail` seeks to the end and works its way back). Try the `perl` version with a file that's larger than your available RAM and you're in for a surprise. – Guntram Blohm Dec 08 '21 at 09:42
1

@dawg, I think that by overlapping head and tail, they mean e.g. a case where the file has only three lines. Given three lines `1`, `2`, and `3`, that last `head`+`tail` solution would print `1`, `2`, `...`, `2`, `3`, which is probably technically correct at least for some phrasings of the problem, but it might also be considered misleading. Looks like the others print the same. – ilkkachu Dec 08 '21 at 10:41
@ilkkachu: think the case of three line file is at best ambiguous what the 'correct result' is. I think `1\n2\n...\n2\n3` is most correct in my view. What do you think is a better result for that? – dawg Dec 08 '21 at 13:42
@GuntramBlohm: Agreed and I added a note to that effect. The two pass awk is reasonable as well in that situation. – dawg Dec 08 '21 at 13:45
1

@dawg, in this narrow context of this Q, we don't know, since the post doesn't say. But more generally, `1\n2\n...\n2\n3` implies that there's something removed in the part where it says `...`, and that's not true in the case of a three or four-line file. It would make more sense to me to print a three line file just as-is, without the ellipsis. _In general_. Of course we don't know what they're doing in this particular case, if there's a use-case that requires/expects all four lines and the `...`, and where the doubled `2` line makes sense, then that needs to be done. – ilkkachu Dec 08 '21 at 13:56

score 1 · Answer 2 · answered Dec 07 '21 at 21:12

1

You may consider this awk solution:

awk -v top=2 -v bot=2 'FNR == NR {++n; next} FNR <= top || FNR > n-top; FNR == top+1 {print "..."}' file{,}

1
2
...
9
10

answered Dec 07 '21 at 21:12

anubhava

761,203
64
569
643

M. Nejat Aydin · Answer 3 · 2021-12-08T00:28:09.517

1

Two single pass sed solutions:

sed '1,2b
     3c\
...
     N
     $!D'

and

sed '1,2b
     3c\
...
     $!{h;d;}
     H;g'

edited Dec 08 '21 at 00:28

answered Dec 07 '21 at 22:04

M. Nejat Aydin

9,597
1
7
17

2

How does this work? It would be more helpful for future readers with related problems (like a count other than 2) if you commented the code and said what you're doing with the pattern / hold space. – Peter Cordes Dec 08 '21 at 08:36

score 0 · Answer 4 · answered Dec 08 '21 at 16:36

Assumptions:

as OP has stated, a solution must be able to work with a stream from a pipe
the total number of lines coming from the stream is unknown
if the total number of lines is less than the sum of the head/tail offsets then we'll print duplicate lines (we can add more logic if OP updates the question with more details on how to address this situation)

A single-pass awk solution that implements a queue in awk to keep track of the most recent N lines; the queue allows us to limit awk's memory usage to just N lines (as opposed to loading the entire input stream into memory, which could be problematic when processing a large volume of lines/data on a machine with limited available memory):

h=2 t=3

cat temp | awk -v head=${h} -v tail=${t} '
    { if (NR <= head) print $0
      lines[NR % tail] = $0
    }

END { print "..."

      if (NR < tail) i=0
      else           i=NR

      do { i=(i+1)%tail
           print lines[i]
         } while (i != (NR % tail) )
    }'

This generates:

1
2
...
8
9
10

Demonstrating the overlap issue:

$ cat temp4
1
2
3
4

With h=3;t=3 the proposed awk code generates:

$ cat temp4 | awk -v head=${h} -v tail=${t} '...'
1
2
3
...
2
3
4

Whether or not this is the 'correct' output will depend on OP's requirements.

Cyrus · Answer 5 · 2021-12-07T21:11:12.043

-1

I suggest with bash:

(head -n 2; echo "..."; tail -n 2) < file

Output:

1
2
...
9
10

edited Dec 07 '21 at 21:11

answered Dec 07 '21 at 21:09

Cyrus

84,225
14
89
153

Why doesn't the OP's solution with `cat temp | (head && echo && tail)` work? – John Kugelman Dec 07 '21 at 21:11
2

It looks like `head` overreads from the input in both cases and then tries to `lseek` backwards. That works if `file` is redirected, but not if the input is a pipe. I'm curious how portable it is to rely on this behavior. Does it just happen to work here, or is the behavior guaranteed in, say, POSIX? – John Kugelman Dec 07 '21 at 21:15
@JohnKugelman: I cannot answer your questions. – Cyrus Dec 07 '21 at 21:17
@JohnKugelman It might be an issue with a dependency or setting on MacOS Monteray, because I think this solution used to work for me – Sam Dec 07 '21 at 21:19
@Cyrus It needs to come through a pipe, adding a space after `n` makes no difference – Sam Dec 07 '21 at 21:20
@Sam: With GNU sed: `cat file | sed -n -e '1,2p; 2a ...' -e '9,10p'`? – Cyrus Dec 07 '21 at 21:31
Fails with `seq 10 | (head -n 2; echo "..."; tail -n 2)` on Arch GNU/Linux, head/tail from GNU coreutils 8.32. – Peter Cordes Dec 08 '21 at 08:41
Similarly fails with `head` from Busybox or whatever the `head` on my Mac is. I'm not surprised, `head` would need to read byte-by-byte from the pipe to know not to overread. At least Bash's `read` does exactly that though, but I can't find if it's actually defined as mandatory for `read` (other than by some implicit assumption, at least). – ilkkachu Dec 08 '21 at 10:54

Print first few and last few lines of file through a pipe with "..." in the middle

Problem Description

Attempted solutions

5 Answers5