Inplace remove last n lines of files without opening them more than once in gawk?

Question

https://www.baeldung.com/linux/remove-last-n-lines-of-file

awk -v n=3 'NR==FNR{total=NR;next} FNR==total-n+1{exit} 1' input.txt input.txt 
01 is my line number. Keep me please!
02 is my line number. Keep me please!
03 is my line number. Keep me please!
04 is my line number. Keep me please!
05 is my line number. Keep me please!
06 is my line number. Keep me please!
07 is my line number. Keep me please!

Here is a way to remove the last n lines. But it is not done inplace and the file is read twice, and it only deals with one file at a time.

How can I inplace remove the last n lines of many files without opening them more than once with one gawk command but without using any other external commands?

You could try like: `awk -v n="3" -v total=$(wc -l < Input_file) 'FNR==total-n+1{exit} 1' Input_file` which will help you to get lines with single run itself, cheers. — RavinderSingh13, Dec 30 '22 at 02:28
I just want to use gawk without using any other external commands. — user1424739, Dec 30 '22 at 02:29
Any particular reason to choose `awk`? Using `head -n -3` would likely be the fastest solution. You'll have to add code for inplace editing, but that would be similar to what `inplace` option does for you. — Sundeep, Dec 30 '22 at 06:00
@Sundeep : you'll be surprised how little time savings there is with `head` (`N = 1639779`): `in0: 100MiB 0:00:00 [1006MiB/s] [1006MiB/s] [=> ] 10% ETA 0:00:00 out9: 715MiB 0:00:00 [ 748MiB/s] [ 748MiB/s] [ <=> ] in0: 988MiB 0:00:00 [1.07GiB/s] [1.07GiB/s] [======================>] 100% ( pvE 0.1 in0 < "${fn1}" | ghead -n -"$N"; ) 0.43s user 0.73s system 117% cpu 0.982 total 6fa4d6fbf7a4900db216024d220322c9 stdin` …... — RARE Kpop Manifesto, Dec 30 '22 at 06:20
@Sundeep : ….. `in0: 338MiB 0:00:00 [3.30GiB/s] [3.30GiB/s] [=======> ] 34% ETA 0:00:00 out9: 715MiB 0:00:01 [ 684MiB/s] [ 684MiB/s] [ <=> ] in0: 988MiB 0:00:00 [3.28GiB/s] [3.28GiB/s] [======================>] 100% ( pvE 0.1 in0 < "${fn1}" | mawk2 -v N="$N" ; ) 0.30s user 0.68s system 90% cpu 1.071 total 6fa4d6fbf7a4900db216024d220322c9 stdin` — RARE Kpop Manifesto, Dec 30 '22 at 06:20

RavinderSingh13 · Answer 1 · 2022-12-30T02:37:47.763

2

With your shown samples please try following awk code. Without using any external utilities as per OP's request in question. We could make use of END block here of awk programming.

awk -v n="3" '
{
  total=FNR
  lines[FNR]=$0
}
END{
  till=total-n
  for(i=1;i<=till;i++){
    print lines[i]
  }
}
' Input_file

edited Dec 30 '22 at 02:37

answered Dec 30 '22 at 02:31

RavinderSingh13

130,504
14
57
93

RARE Kpop Manifesto · Answer 2 · 2023-01-03T16:40:33.213

single-pass awk solution that requires neither arrays nor gawk

— (unless your file is over 500 MB, then it might be slightly slower) :

rm -f file.txt

jot -c 30 51 > file.txt

gcat -n file.txt | rs -t -c$'\n' -C'#' 0 5 | column -s'#' -t

 1  3       7   9      13   ?      19   E      25   K
 2  4       8   :      14   @      20   F      26   L
 3  5       9   ;      15   A      21   G      27   M
 4  6      10   <      16   B      22   H      28   N
 5  7      11   =      17   C      23   I      29   O
 6  8      12   >      18   D      24   J      30   P

mawk -v __='file.txt' -v N='13' 'BEGIN { 

OFS = FS = RS
      RS = "^$"

getline <(__); close(__)
  
print $!(NF -= NF < (N+=_==$NF) ? NF : N) >(__) }'

gcat -n file.txt | rs -t -c$'\n' -C'#' 6 | column -s'#' -t ;


 1  3       7   9      13   ?
 2  4       8   :      14   @
 3  5       9   ;      15   A
 4  6      10   <      16   B
 5  7      11   =      17   C
 6  8      12   >

Speed is hardly a concern :

115K rows 198 MB file took 0.254 secs

rows       = 115567. | UTF8 chars = 133793410. | bytes      = 207390680.

( mawk2 -v __="${fn1}" -v N='13' ; )  
0.04s user 0.20s system 94% cpu 0.254 total
 
rows       = 115554. | UTF8 chars = 133779254. | bytes      = 207370006.

5.98 million rows 988 MB file took 1.44 secs

rows       = 5983333. | UTF8 chars = 969069988. | bytes      = 1036334374.

( mawk2 -v __="${fn1}" -v N='13' ; )
0.33s user 1.07s system 97% cpu 1.435 total
 
rows       = 5983320. | UTF8 chars = 969068062. | bytes      = 1036332426.

score 1 · Answer 3 · answered Dec 30 '22 at 14:32

Another way to do it, using GAWK, with option The BEGINFILE and ENDFILE Special Patterns:

{ lines[++numLines] = $0 }
BEGINFILE { fname=FILENAME}
ENDFILE { prt() }

function prt(   lineNr,maxLines) {
    close(fname)
    printf "" > fname
    maxLines = numLines - n
    for ( lineNr=1; lineNr<=maxLines; lineNr++ ) {
            print lines[lineNr] > fname
    }
    close(fname)
    numLines = 0
}

user1424739 · Answer 4 · 2023-01-02T04:18:47.387

1

I find that this is the most succinct solution to the problem.

$ gawk -i inplace -v n=3 -v ORS= -e '{ lines[FNR]=$0 RT }
ENDFILE {
    for(i=1;i<=FNR-n;++i) {
        print lines[i]
    }
}' -- file{1..3}.txt

edited Jan 02 '23 at 04:18

answered Dec 31 '22 at 03:27

user1424739

11,937
17
63
152

You are right. I have deleted `BEGIN { delete lines[0] }`. I don't assume the last line ends with new line. So I use RT and set ORS as empty. – user1424739 Jan 02 '23 at 04:18
What if n is zero? In that case, the lastline without newline matters? – user1424739 Jan 02 '23 at 16:03

Inplace remove last n lines of files without opening them more than once in gawk?

4 Answers4