How can I read first n and last n lines from a file?

Question

How can I read the first n lines and the last n lines of a file?

For n=2, I read online that (head -n2 && tail -n2) would work, but it doesn't.

$ cat x
1
2
3
4
5
$ cat x | (head -n2 && tail -n2)
1
2

The expected output for n=2 would be:

http://unix.stackexchange.com/questions/139089/how-to-read-first-and-last-line-from-cat-output — Amir, Feb 19 '15 at 20:16
Also, the link you sent is not helpful because I do not know the range really. I am looking for a simple solution for this — Amir, Feb 19 '15 at 20:18
Interestingly, `cat x | (head -n2 && tail -n2)` doesn't work but `(head -n2 && tail -n2) < x` does. I'll have to meditate a bit on why that is. — Wintermute, Feb 19 '15 at 20:20
What would the expected output be if the input file was 3 lines long? Would it be `1 2 3` or `1 2 2 3` or something else? What if it was only 2 lines long - would the output be `1 2 1 2` or `1 1 2 2` or `1 2` or something else? — Ed Morton, Feb 19 '15 at 20:23
OK, now show the expected output for `n=3`, `n=5`, and `n=7` given that input file. — Ed Morton, Feb 19 '15 at 20:27
Well my file is has more number of lines than 2n. I am not quite sure what is the answer to your question when you have less number of lines then 2n. — Amir, Feb 19 '15 at 20:28
I don't think the `head && tail` trick is reliable. `head` from GNU coreutils behaves differently for pipes and regular files (source: the source), reading blockwise in one case but not the other. Depending on implementation details like that seems like a bad idea -- it's not guaranteed that `head` will leave everything it doesn't print for `tail` to work with. — Wintermute, Feb 19 '15 at 20:30
possible duplicate of [Truncate middle of piped text and replace with ellipsis in one command](http://stackoverflow.com/questions/28158685/truncate-middle-of-piped-text-and-replace-with-ellipsis-in-one-command) — rici, Feb 19 '15 at 20:41

score 10 · Answer 1 · edited Feb 19 '15 at 20:56

10

head -n2 file && tail -n2 file

edited Feb 19 '15 at 20:56

gniourf_gniourf

44,650
9
93
104

answered Feb 19 '15 at 20:36

srd

1,207
12
13

2

UUOC. `head -n2 x && tail -n2 x` – rici Feb 19 '15 at 20:44
1

@rici: that was easy to fix `:D`. – gniourf_gniourf Feb 19 '15 at 20:56
2

This won't produce the correct output if the file is 3 lines long or less. – Walter Nissen Jan 17 '20 at 19:25
An explanation would be in order. – Peter Mortensen Aug 21 '20 at 16:17
This isn't guaranteed to work even if your file is longer than 4 lines, if a single `head` buffer is so long enough that there aren't enough lines left in the file for `tail` to work. – Charles Duffy Feb 19 '21 at 00:22
Try `printf '%s\n' one two three four | { head -n1 && tail -n1; }` -- you'll see the `tail` has no output, because `head` filled its buffer with the data that `tail` would need to have for correct operation. – Charles Duffy Feb 19 '21 at 00:22

Ed Morton · Accepted Answer · 2015-02-19T21:57:08.300

3

Chances are you're going to want something like:

... | awk -v OFS='\n' '{a[NR]=$0} END{print a[1], a[2], a[NR-1], a[NR]}'

or if you need to specify a number and taking into account @Wintermute's astute observation that you don't need to buffer the whole file, something like this is what you really want:

... | awk -v n=2 'NR<=n{print;next} {buf[((NR-1)%n)+1]=$0}
         END{for (i=1;i<=n;i++) print buf[((NR+i-1)%n)+1]}'

I think the math is correct on that - hopefully you get the idea to use a rotating buffer indexed by the NR modded by the size of the buffer and adjusted to use indices in the range 1-n instead of 0-(n-1).

To help with comprehension of the modulus operator used in the indexing above, here is an example with intermediate print statements to show the logic as it executes:

$ cat file   
1
2
3
4
5
6
7
8

.

$ cat tst.awk                
BEGIN {
    print "Populating array by index ((NR-1)%n)+1:"
}
{
    buf[((NR-1)%n)+1] = $0

    printf "NR=%d, n=%d: ((NR-1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
        NR, n, NR-1, (NR-1)%n, ((NR-1)%n)+1, ((NR-1)%n)+1, buf[((NR-1)%n)+1]

}
END { 
    print "\nAccessing array by index ((NR+i-1)%n)+1:"
    for (i=1;i<=n;i++) {
        printf "NR=%d, i=%d, n=%d: (((NR+i = %d) - 1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
            NR, i, n, NR+i, NR+i-1, (NR+i-1)%n, ((NR+i-1)%n)+1, ((NR+i-1)%n)+1, buf[((NR+i-1)%n)+1]
    }
}
$ 
$ awk -v n=3 -f tst.awk file
Populating array by index ((NR-1)%n)+1:
NR=1, n=3: ((NR-1 = 0) %n = 0) +1 = 1 -> buf[1] = 1
NR=2, n=3: ((NR-1 = 1) %n = 1) +1 = 2 -> buf[2] = 2
NR=3, n=3: ((NR-1 = 2) %n = 2) +1 = 3 -> buf[3] = 3
NR=4, n=3: ((NR-1 = 3) %n = 0) +1 = 1 -> buf[1] = 4
NR=5, n=3: ((NR-1 = 4) %n = 1) +1 = 2 -> buf[2] = 5
NR=6, n=3: ((NR-1 = 5) %n = 2) +1 = 3 -> buf[3] = 6
NR=7, n=3: ((NR-1 = 6) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, n=3: ((NR-1 = 7) %n = 1) +1 = 2 -> buf[2] = 8

Accessing array by index ((NR+i-1)%n)+1:
NR=8, i=1, n=3: (((NR+i = 9) - 1 = 8) %n = 2) +1 = 3 -> buf[3] = 6
NR=8, i=2, n=3: (((NR+i = 10) - 1 = 9) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, i=3, n=3: (((NR+i = 11) - 1 = 10) %n = 1) +1 = 2 -> buf[2] = 8

edited Feb 19 '15 at 21:57

answered Feb 19 '15 at 20:29

Ed Morton

188,023
17
78
185

1

+1 since this works in a pipe. You might add a more elaborated version which takes files (streams) into account having less then 4 (head+tail) lines.. – hek2mgl Feb 19 '15 at 20:31
@EdMorton But it would still need to buffer the whole stream in memory.. (However I don't see a way without buffering if it should work in a pipe, except saving the stream into a temporary file) – hek2mgl Feb 19 '15 at 20:34
Yeah, now it is not scalable for a large file. Still it works for me. – Amir Feb 19 '15 at 20:36
I wonder why cat x | (head -n2 && tail -n2) doesn't work... because this would be the perfect solution – Amir Feb 19 '15 at 20:39
You could keep just a running store of the last n lines you read. `tail` doesn't read from the beginning for regular files, mind you. – Wintermute Feb 19 '15 at 20:42
`awk -v n=2 'NR <= n { print; next } { delete lines[NR - n]; lines[NR] = $0 } END { for(i = NR - n + 1; i <= NR; ++i) print lines[i] }'` is what I was about to suggest. As a bonus, it doesn't print duplicate center lines in small files, although that wasn't a problem for OP. EDIT: Oh, I like the modulo idea. – Wintermute Feb 19 '15 at 20:46
To have a \n between lines: awk -v ORS='\n' '{a[NR]=$0} END{print a[1]"\n"a[2]"\n"a[NR-1]"\n"a[NR]}' – Amir Feb 19 '15 at 21:08
1

I understand but the bug was just that I was setting `ORS='\n'` when I should have been setting `OFS='\n'`. Now that that's fixed there's no need to explicitly hard-code `"\n"`s between fields. – Ed Morton Feb 19 '15 at 21:10
It struck me that the modulus logic using in populating/accessing the array contents might not be obvious so I updated my answer with an example that prints all relevant values as they are used. Hope that helps now and possibly as a future reference for other examples. – Ed Morton Feb 19 '15 at 21:59

score 3 · Answer 3 · answered Feb 20 '15 at 01:59

3

This might work for you (GNU sed):

sed -n ':a;N;s/[^\n]*/&/2;Ta;2p;$p;D' file

This keeps a window of 2 (replace the 2's for n) lines and then prints the first 2 lines and at end of file prints the window i.e. the last 2 lines.

answered Feb 20 '15 at 01:59

potong

55,640
6
51
83

score 2 · Answer 4 · answered Dec 28 '17 at 05:41

Here's a GNU sed one-liner that prints the first 10 and last 10 lines:

gsed -ne'1,10{p;b};:a;$p;N;21,$D;ba'

If you want to print a '--' separator between them:

gsed -ne'1,9{p;b};10{x;s/$/--/;x;G;p;b};:a;$p;N;21,$D;ba'

If you're on a Mac and don't have GNU sed, you can't condense as much:

sed -ne'1,9{' -e'p;b' -e'}' -e'10{' -e'x;s/$/--/;x;G;p;b' -e'}' -e':a' -e'$p;N;21,$D;ba'

Explanation

gsed -ne' invoke sed without automatic printing pattern space

-e'1,9{p;b}' print the first 9 lines

-e'10{x;s/$/--/;x;G;p;b}' print line 10 with an appended '--' separator

-e':a;$p;N;21,$D;ba' print the last 10 lines

score 1 · Answer 5 · edited Aug 21 '20 at 16:20

awk -v n=4 'NR<=n; {b = b "\n" $0} NR>=n {sub(/[^\n]*\n/,"",b)} END {print b}'

The first n lines are covered by NR<=n;. For the last n lines, we just keep track of a buffer holding the latest n lines, repeatedly adding one to the end and removing one from the front (after the first n).

It's possible to do it more efficiently, with an array of lines instead of a single buffer, but even with gigabytes of input, you'd probably waste more in brain time writing it out than you'd save in computer time by running it.

ETA: Because the above timing estimate provoked some discussion in (now deleted) comments, I'll add anecdata from having tried that out.

With a huge file (100M lines, 3.9 GiB, n=5) it's taken 454 seconds, compared to @EdMorton's lined-buffer solution, which executed in only 30 seconds. With more modest inputs ("mere" millions of lines) the ratio is similar: 4.7 seconds vs. 0.53 seconds.

Almost all of that additional time in this solution seems to be spent in the sub() function; a tiny fraction also does come from string concatenation being slower than just replacing an array member.

Caleb · Answer 6 · 2019-08-05T09:36:23.373

If you are using a shell that supports process substitution, another way to accomplish this is to write to multiple processes, one for head and one for tail. Suppose for this example your input comes from a pipe feeding you content of unknown length. You want to use just the first 5 lines and the last 10 lines and pass them on to another pipe:

cat | { tee >(head -5) >(tail -10) 1>/dev/null} | cat

The use of {} collects the output from inside the group (there will be two different programs writing to stdout inside the process shells). The 1>/dev/null is to get rid of the extra copy tee will try to write to it's own stdout.

That demonstrates the concept and all the moving parts, but it can be simplified a little in practice by using the STDOUT stream of tee instead of discarding it. Note the command grouping is still necessary here to pass the output on through the next pipe!

cat | { tee >(head -5) | tail -15 } | cat

Obviously replace cat in the pipeline with whatever you are actually doing. If your input can handle the same content to writing to multiple files you could eliminate the use of tee entirely as well as monkeying with STDOUT. Say you have a command that accepts multiple -o output file name flags:

{ mycommand -o >(head -5) -o >(tail -10)} | cat

score 0 · Answer 7 · edited Aug 21 '20 at 16:21

0

Use GNU parallel. To print the first three lines and the last three lines:

parallel {} -n 3 file ::: head tail

edited Aug 21 '20 at 16:21

Peter Mortensen

30,738
21
105
131

answered Apr 19 '18 at 23:32

kko

131
1
4

score 0 · Answer 8 · edited Aug 21 '20 at 16:22

0

Based on dcaswell's answer, the following sed script prints the first and last 10 lines of a file:

# Make a test file first
testit=$(mktemp -u)
seq 1 100 > $testit
# This sed script:
sed -n ':a;1,10h;N;${x;p;i\
-----
;x;p};11,$D;ba' $testit
rm $testit

Yields this:

edited Aug 21 '20 at 16:22

Peter Mortensen

30,738
21
105
131

answered May 29 '18 at 17:53

steveo'america

206
1
7

And while it works for files shorter than 20 rows, it seems to swallow the last line for files shorter than 10 rows. ugh. – steveo'america May 29 '18 at 18:06

score 0 · Answer 9 · edited Aug 21 '20 at 16:24

Here is another AWK script. Assuming there might be overlap of head and tail.

File `script.awk`

BEGIN {range = 3} # Define the head and tail range
NR <= range {print} # Output the head; for the first lines in range
{ arr[NR % range] = $0} # Store the current line in a rotating array
END { # Last line reached
    for (row = NR - range + 1; row <= NR; row++) { # Reread the last range lines from array
        print arr[row % range];
    }
}

Running the script

seq 1 7 | awk -f script.awk

Output

For overlapping head and tail:

seq 1 5 |awk -f script.awk


1
2
3
3
4
5

score 0 · Answer 10 · answered Feb 09 '23 at 12:10

0

Print the first and last n lines

For n=1:

seq 1 10 | sed '1p;$!d'

Output:

1
10

For n=2:

seq 1 10 | sed '1,2P;$!N;$!D'

Output:

For n>=3, use the generic regex:

':a;$q;N;(n+1),(n*2)P;(n+1),$D;ba'

For n=3:

seq 1 10 | sed ':a;$q;N;4,6P;4,$D;ba'

Output:

answered Feb 09 '23 at 12:10

Geilson Figueiredo

1
1

the generic regex works for any value of n, but the n=1 and n=2 versions are more efficient – Geilson Figueiredo Feb 09 '23 at 12:56

How can I read first n and last n lines from a file?

10 Answers10

Explanation

File `script.awk`

Linked

Related

How can I read first n and last n lines from a file?

10 Answers10

Explanation

File script.awk

Linked

Related

File `script.awk`