8

How can I read the first n lines and the last n lines of a file?

For n=2, I read online that (head -n2 && tail -n2) would work, but it doesn't.

$ cat x
1
2
3
4
5
$ cat x | (head -n2 && tail -n2)
1
2

The expected output for n=2 would be:

1
2
4
5
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Amir
  • 5,996
  • 13
  • 48
  • 61
  • 1
    http://unix.stackexchange.com/questions/139089/how-to-read-first-and-last-line-from-cat-output – Amir Feb 19 '15 at 20:16
  • Also, the link you sent is not helpful because I do not know the range really. I am looking for a simple solution for this – Amir Feb 19 '15 at 20:18
  • Interestingly, `cat x | (head -n2 && tail -n2)` doesn't work but `(head -n2 && tail -n2) < x` does. I'll have to meditate a bit on why that is. – Wintermute Feb 19 '15 at 20:20
  • 3
    What would the expected output be if the input file was 3 lines long? Would it be `1 2 3` or `1 2 2 3` or something else? What if it was only 2 lines long - would the output be `1 2 1 2` or `1 1 2 2` or `1 2` or something else? – Ed Morton Feb 19 '15 at 20:23
  • I edited the question with expected output. – Amir Feb 19 '15 at 20:25
  • OK, now show the expected output for `n=3`, `n=5`, and `n=7` given that input file. – Ed Morton Feb 19 '15 at 20:27
  • Well my file is has more number of lines than 2n. I am not quite sure what is the answer to your question when you have less number of lines then 2n. – Amir Feb 19 '15 at 20:28
  • 2
    I don't think the `head && tail` trick is reliable. `head` from GNU coreutils behaves differently for pipes and regular files (source: the source), reading blockwise in one case but not the other. Depending on implementation details like that seems like a bad idea -- it's not guaranteed that `head` will leave everything it doesn't print for `tail` to work with. – Wintermute Feb 19 '15 at 20:30
  • possible duplicate of [Truncate middle of piped text and replace with ellipsis in one command](http://stackoverflow.com/questions/28158685/truncate-middle-of-piped-text-and-replace-with-ellipsis-in-one-command) – rici Feb 19 '15 at 20:41

10 Answers10

10
head -n2 file && tail -n2 file
gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104
srd
  • 1,207
  • 12
  • 13
3

Chances are you're going to want something like:

... | awk -v OFS='\n' '{a[NR]=$0} END{print a[1], a[2], a[NR-1], a[NR]}'

or if you need to specify a number and taking into account @Wintermute's astute observation that you don't need to buffer the whole file, something like this is what you really want:

... | awk -v n=2 'NR<=n{print;next} {buf[((NR-1)%n)+1]=$0}
         END{for (i=1;i<=n;i++) print buf[((NR+i-1)%n)+1]}'

I think the math is correct on that - hopefully you get the idea to use a rotating buffer indexed by the NR modded by the size of the buffer and adjusted to use indices in the range 1-n instead of 0-(n-1).

To help with comprehension of the modulus operator used in the indexing above, here is an example with intermediate print statements to show the logic as it executes:

$ cat file   
1
2
3
4
5
6
7
8

.

$ cat tst.awk                
BEGIN {
    print "Populating array by index ((NR-1)%n)+1:"
}
{
    buf[((NR-1)%n)+1] = $0

    printf "NR=%d, n=%d: ((NR-1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
        NR, n, NR-1, (NR-1)%n, ((NR-1)%n)+1, ((NR-1)%n)+1, buf[((NR-1)%n)+1]

}
END { 
    print "\nAccessing array by index ((NR+i-1)%n)+1:"
    for (i=1;i<=n;i++) {
        printf "NR=%d, i=%d, n=%d: (((NR+i = %d) - 1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
            NR, i, n, NR+i, NR+i-1, (NR+i-1)%n, ((NR+i-1)%n)+1, ((NR+i-1)%n)+1, buf[((NR+i-1)%n)+1]
    }
}
$ 
$ awk -v n=3 -f tst.awk file
Populating array by index ((NR-1)%n)+1:
NR=1, n=3: ((NR-1 = 0) %n = 0) +1 = 1 -> buf[1] = 1
NR=2, n=3: ((NR-1 = 1) %n = 1) +1 = 2 -> buf[2] = 2
NR=3, n=3: ((NR-1 = 2) %n = 2) +1 = 3 -> buf[3] = 3
NR=4, n=3: ((NR-1 = 3) %n = 0) +1 = 1 -> buf[1] = 4
NR=5, n=3: ((NR-1 = 4) %n = 1) +1 = 2 -> buf[2] = 5
NR=6, n=3: ((NR-1 = 5) %n = 2) +1 = 3 -> buf[3] = 6
NR=7, n=3: ((NR-1 = 6) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, n=3: ((NR-1 = 7) %n = 1) +1 = 2 -> buf[2] = 8

Accessing array by index ((NR+i-1)%n)+1:
NR=8, i=1, n=3: (((NR+i = 9) - 1 = 8) %n = 2) +1 = 3 -> buf[3] = 6
NR=8, i=2, n=3: (((NR+i = 10) - 1 = 9) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, i=3, n=3: (((NR+i = 11) - 1 = 10) %n = 1) +1 = 2 -> buf[2] = 8
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    +1 since this works in a pipe. You might add a more elaborated version which takes files (streams) into account having less then 4 (head+tail) lines.. – hek2mgl Feb 19 '15 at 20:31
  • @EdMorton But it would still need to buffer the whole stream in memory.. (However I don't see a way without buffering if it should work in a pipe, except saving the stream into a temporary file) – hek2mgl Feb 19 '15 at 20:34
  • Yeah, now it is not scalable for a large file. Still it works for me. – Amir Feb 19 '15 at 20:36
  • I wonder why cat x | (head -n2 && tail -n2) doesn't work... because this would be the perfect solution – Amir Feb 19 '15 at 20:39
  • You could keep just a running store of the last n lines you read. `tail` doesn't read from the beginning for regular files, mind you. – Wintermute Feb 19 '15 at 20:42
  • `awk -v n=2 'NR <= n { print; next } { delete lines[NR - n]; lines[NR] = $0 } END { for(i = NR - n + 1; i <= NR; ++i) print lines[i] }'` is what I was about to suggest. As a bonus, it doesn't print duplicate center lines in small files, although that wasn't a problem for OP. EDIT: Oh, I like the modulo idea. – Wintermute Feb 19 '15 at 20:46
  • To have a \n between lines: awk -v ORS='\n' '{a[NR]=$0} END{print a[1]"\n"a[2]"\n"a[NR-1]"\n"a[NR]}' – Amir Feb 19 '15 at 21:08
  • 1
    I understand but the bug was just that I was setting `ORS='\n'` when I should have been setting `OFS='\n'`. Now that that's fixed there's no need to explicitly hard-code `"\n"`s between fields. – Ed Morton Feb 19 '15 at 21:10
  • It struck me that the modulus logic using in populating/accessing the array contents might not be obvious so I updated my answer with an example that prints all relevant values as they are used. Hope that helps now and possibly as a future reference for other examples. – Ed Morton Feb 19 '15 at 21:59
3

This might work for you (GNU sed):

sed -n ':a;N;s/[^\n]*/&/2;Ta;2p;$p;D' file

This keeps a window of 2 (replace the 2's for n) lines and then prints the first 2 lines and at end of file prints the window i.e. the last 2 lines.

potong
  • 55,640
  • 6
  • 51
  • 83
2

Here's a GNU sed one-liner that prints the first 10 and last 10 lines:

gsed -ne'1,10{p;b};:a;$p;N;21,$D;ba'

If you want to print a '--' separator between them:

gsed -ne'1,9{p;b};10{x;s/$/--/;x;G;p;b};:a;$p;N;21,$D;ba'

If you're on a Mac and don't have GNU sed, you can't condense as much:

sed -ne'1,9{' -e'p;b' -e'}' -e'10{' -e'x;s/$/--/;x;G;p;b' -e'}' -e':a' -e'$p;N;21,$D;ba'

Explanation

gsed -ne' invoke sed without automatic printing pattern space

-e'1,9{p;b}' print the first 9 lines

-e'10{x;s/$/--/;x;G;p;b}' print line 10 with an appended '--' separator

-e':a;$p;N;21,$D;ba' print the last 10 lines

parleer
  • 1,220
  • 3
  • 12
  • 22
1

awk -v n=4 'NR<=n; {b = b "\n" $0} NR>=n {sub(/[^\n]*\n/,"",b)} END {print b}'

The first n lines are covered by NR<=n;. For the last n lines, we just keep track of a buffer holding the latest n lines, repeatedly adding one to the end and removing one from the front (after the first n).

It's possible to do it more efficiently, with an array of lines instead of a single buffer, but even with gigabytes of input, you'd probably waste more in brain time writing it out than you'd save in computer time by running it.

ETA: Because the above timing estimate provoked some discussion in (now deleted) comments, I'll add anecdata from having tried that out.

With a huge file (100M lines, 3.9 GiB, n=5) it's taken 454 seconds, compared to @EdMorton's lined-buffer solution, which executed in only 30 seconds. With more modest inputs ("mere" millions of lines) the ratio is similar: 4.7 seconds vs. 0.53 seconds.

Almost all of that additional time in this solution seems to be spent in the sub() function; a tiny fraction also does come from string concatenation being slower than just replacing an array member.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
hemflit
  • 2,819
  • 3
  • 22
  • 17
1

If you are using a shell that supports process substitution, another way to accomplish this is to write to multiple processes, one for head and one for tail. Suppose for this example your input comes from a pipe feeding you content of unknown length. You want to use just the first 5 lines and the last 10 lines and pass them on to another pipe:

cat | { tee >(head -5) >(tail -10) 1>/dev/null} | cat

The use of {} collects the output from inside the group (there will be two different programs writing to stdout inside the process shells). The 1>/dev/null is to get rid of the extra copy tee will try to write to it's own stdout.

That demonstrates the concept and all the moving parts, but it can be simplified a little in practice by using the STDOUT stream of tee instead of discarding it. Note the command grouping is still necessary here to pass the output on through the next pipe!

cat | { tee >(head -5) | tail -15 } | cat

Obviously replace cat in the pipeline with whatever you are actually doing. If your input can handle the same content to writing to multiple files you could eliminate the use of tee entirely as well as monkeying with STDOUT. Say you have a command that accepts multiple -o output file name flags:

{ mycommand -o >(head -5) -o >(tail -10)} | cat
Caleb
  • 5,084
  • 1
  • 46
  • 65
0

Use GNU parallel. To print the first three lines and the last three lines:

parallel {} -n 3 file ::: head tail
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
kko
  • 131
  • 1
  • 4
0

Based on dcaswell's answer, the following sed script prints the first and last 10 lines of a file:

# Make a test file first
testit=$(mktemp -u)
seq 1 100 > $testit
# This sed script:
sed -n ':a;1,10h;N;${x;p;i\
-----
;x;p};11,$D;ba' $testit
rm $testit

Yields this:

1
2
3
4
5
6
7
8
9
10
-----
90
91
92
93
94
95
96
97
98
99
100
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
steveo'america
  • 206
  • 1
  • 7
0

Here is another AWK script. Assuming there might be overlap of head and tail.

File script.awk

BEGIN {range = 3} # Define the head and tail range
NR <= range {print} # Output the head; for the first lines in range
{ arr[NR % range] = $0} # Store the current line in a rotating array
END { # Last line reached
    for (row = NR - range + 1; row <= NR; row++) { # Reread the last range lines from array
        print arr[row % range];
    }
}

Running the script

seq 1 7 | awk -f script.awk

Output

1
2
3
5
6
7

For overlapping head and tail:

seq 1 5 |awk -f script.awk


1
2
3
3
4
5
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Dudi Boy
  • 4,551
  • 1
  • 15
  • 30
0

Print the first and last n lines

  • For n=1:
seq 1 10 | sed '1p;$!d'

Output:

1
10
  • For n=2:
seq 1 10 | sed '1,2P;$!N;$!D'

Output:

1
2
9
10
  • For n>=3, use the generic regex:
':a;$q;N;(n+1),(n*2)P;(n+1),$D;ba'

For n=3:

seq 1 10 | sed ':a;$q;N;4,6P;4,$D;ba'

Output:

1
2
3
8
9
10