16

Problem Description

This is my file

1
2
3
4
5
6
7
8
9
10

I would like to send the cat output of this file through a pipe and receive this

% cat file | some_command
1
2
...
9
10

Attempted solutions

Here are some solutions I've tried, with their output

% cat temp | (head -n2 && echo '...' && tail -n2)
1
2
...
% cat temp | tee >(head -n3) >(tail -n3) >/dev/null
1
2
3
8
9
10
# I don't know how to get the ...
% cat temp | sed -e 1b -e '$!d'
1
10

% cat temp | awk 'NR==1;END{print}'
1
10
# Can only get 2 lines
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Sam
  • 1,765
  • 11
  • 82
  • 176

5 Answers5

8

An awk:

awk -v head=2 -v tail=2 'FNR==NR && FNR<=head
FNR==NR && cnt++==head {print "..."}
NR>FNR && FNR>(cnt-tail)' file file

Or if a single pass is important (and memory allows), you can use perl:

perl -0777 -lanE 'BEGIN{$head=2; $tail=2;}
END{say join("\n", @F[0..$head-1],("..."),@F[-$tail..-1]);}' file   

Or, an awk that is one pass:

awk -v head=2 -v tail=2 'FNR<=head
{lines[FNR]=$0}
END{
    print "..."
    for (i=FNR-tail+1; i<=FNR; i++) print lines[i]
}' file

Or, nothing wrong with being a caveman direct like:

head -2 file; echo "..."; tail -2 file

Any of these prints:

1
2
...
9
10

It terms of efficiency, here are some stats.

For small files (ie, less than 10 MB or so) all these are less than 1 second and the 'caveman' approach is 2 ms.

I then created a 1.1 GB file with seq 99999999 >file

  • The two pass awk: 50 secs
  • One pass perl: 10 seconds
  • One pass awk: 29 seconds
  • 'Caveman': 2 MS
dawg
  • 98,345
  • 23
  • 131
  • 206
  • 2
    Now handle cases where lines count is less than head and tail, and case when head and tail lines intersects ^^ – Léa Gris Dec 07 '21 at 22:42
  • They all handle overlapping head and tail. – dawg Dec 08 '21 at 01:20
  • 1
    Especially with large files, the "caveman" approach is the best, because it's the only one that won't read the whole file (`head` stops after a few lines, and `tail` seeks to the end and works its way back). Try the `perl` version with a file that's larger than your available RAM and you're in for a surprise. – Guntram Blohm Dec 08 '21 at 09:42
  • 1
    @dawg, I think that by overlapping head and tail, they mean e.g. a case where the file has only three lines. Given three lines `1`, `2`, and `3`, that last `head`+`tail` solution would print `1`, `2`, `...`, `2`, `3`, which is probably technically correct at least for some phrasings of the problem, but it might also be considered misleading. Looks like the others print the same. – ilkkachu Dec 08 '21 at 10:41
  • @ilkkachu: think the case of three line file is at best ambiguous what the 'correct result' is. I think `1\n2\n...\n2\n3` is most correct in my view. What do you think is a better result for that? – dawg Dec 08 '21 at 13:42
  • @GuntramBlohm: Agreed and I added a note to that effect. The two pass awk is reasonable as well in that situation. – dawg Dec 08 '21 at 13:45
  • 1
    @dawg, in this narrow context of this Q, we don't know, since the post doesn't say. But more generally, `1\n2\n...\n2\n3` implies that there's something removed in the part where it says `...`, and that's not true in the case of a three or four-line file. It would make more sense to me to print a three line file just as-is, without the ellipsis. _In general_. Of course we don't know what they're doing in this particular case, if there's a use-case that requires/expects all four lines and the `...`, and where the doubled `2` line makes sense, then that needs to be done. – ilkkachu Dec 08 '21 at 13:56
1

You may consider this awk solution:

awk -v top=2 -v bot=2 'FNR == NR {++n; next} FNR <= top || FNR > n-top; FNR == top+1 {print "..."}' file{,}

1
2
...
9
10
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Two single pass sed solutions:

sed '1,2b
     3c\
...
     N
     $!D'

and

sed '1,2b
     3c\
...
     $!{h;d;}
     H;g'
M. Nejat Aydin
  • 9,597
  • 1
  • 7
  • 17
  • 2
    How does this work? It would be more helpful for future readers with related problems (like a count other than 2) if you commented the code and said what you're doing with the pattern / hold space. – Peter Cordes Dec 08 '21 at 08:36
0

Assumptions:

  • as OP has stated, a solution must be able to work with a stream from a pipe
  • the total number of lines coming from the stream is unknown
  • if the total number of lines is less than the sum of the head/tail offsets then we'll print duplicate lines (we can add more logic if OP updates the question with more details on how to address this situation)

A single-pass awk solution that implements a queue in awk to keep track of the most recent N lines; the queue allows us to limit awk's memory usage to just N lines (as opposed to loading the entire input stream into memory, which could be problematic when processing a large volume of lines/data on a machine with limited available memory):

h=2 t=3

cat temp | awk -v head=${h} -v tail=${t} '
    { if (NR <= head) print $0
      lines[NR % tail] = $0
    }

END { print "..."

      if (NR < tail) i=0
      else           i=NR

      do { i=(i+1)%tail
           print lines[i]
         } while (i != (NR % tail) )
    }'

This generates:

1
2
...
8
9
10

Demonstrating the overlap issue:

$ cat temp4
1
2
3
4

With h=3;t=3 the proposed awk code generates:

$ cat temp4 | awk -v head=${h} -v tail=${t} '...'
1
2
3
...
2
3
4

Whether or not this is the 'correct' output will depend on OP's requirements.

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
-1

I suggest with bash:

(head -n 2; echo "..."; tail -n 2) < file

Output:

1
2
...
9
10
Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • Why doesn't the OP's solution with `cat temp | (head && echo && tail)` work? – John Kugelman Dec 07 '21 at 21:11
  • 2
    It looks like `head` overreads from the input in both cases and then tries to `lseek` backwards. That works if `file` is redirected, but not if the input is a pipe. I'm curious how portable it is to rely on this behavior. Does it just happen to work here, or is the behavior guaranteed in, say, POSIX? – John Kugelman Dec 07 '21 at 21:15
  • @JohnKugelman: I cannot answer your questions. – Cyrus Dec 07 '21 at 21:17
  • @JohnKugelman It might be an issue with a dependency or setting on MacOS Monteray, because I think this solution used to work for me – Sam Dec 07 '21 at 21:19
  • @Cyrus It needs to come through a pipe, adding a space after `n` makes no difference – Sam Dec 07 '21 at 21:20
  • @Sam: With GNU sed: `cat file | sed -n -e '1,2p; 2a ...' -e '9,10p'`? – Cyrus Dec 07 '21 at 21:31
  • Fails with `seq 10 | (head -n 2; echo "..."; tail -n 2)` on Arch GNU/Linux, head/tail from GNU coreutils 8.32. – Peter Cordes Dec 08 '21 at 08:41
  • Similarly fails with `head` from Busybox or whatever the `head` on my Mac is. I'm not surprised, `head` would need to read byte-by-byte from the pipe to know not to overread. At least Bash's `read` does exactly that though, but I can't find if it's actually defined as mandatory for `read` (other than by some implicit assumption, at least). – ilkkachu Dec 08 '21 at 10:54