Here's a pure awk
based solution without using wc
, head
, tail
, or any tracking arrays
- Finished a
1.85 GB
file in 2.456 seconds
Might be a tiny tiny bit slower than wc -l
method, but this one works for items coming through the pipe as well as files with blank lines in between.
So as long as your data input is smaller than, say, 2.5 GB
, might as well do it in one shot instead of the "idiomatic way"
__="${m3t}"
pvE0 < "$__" | wc5
sleep 2
( time ( pvE0 < "$__" |
mawk2 'BEGIN { FS = OFS = RS
RS = "^$"
_+=_^= ORS = ""
} _^_ <= NF && NF -= _^("" == $NF)+_'
) | pvE9 ) | wc5
sleep 2
( time ( pvE0 < "$__" | mawk2 '…' ) | pvE9 ) | xxh128sum
in0: 1.85GiB 0:00:13 [ 143MiB/s] [ 143MiB/s][===========>] 100%
rows = 12494275. | UTF8 chars = 1285316715. | bytes = 1983544693.
in0: 322MiB 0:00:00 [3.15GiB/s] [3.15GiB/s] [====> ] 17% ETA 0:00:00
out9: 1.85GiB 0:00:15 [ 125MiB/s] [ 125MiB/s] [ <=> ]
in0: 1.85GiB 0:00:00 [3.26GiB/s] [3.26GiB/s] [============>] 100%
( pvE 0.1 in0 < "$__" | mawk2 ; )
0.71s user 1.48s system 14% cpu 15.032 total
rows = 12494272. | UTF8 chars = 1285314823. | bytes = 1983539445.
in0: 330MiB 0:00:00 [3.23GiB/s] [3.23GiB/s] [====> ] 17% ETA 0:00:00
out9: 1.85GiB 0:00:02 [ 774MiB/s] [ 774MiB/s] [ <=> ]
in0: 1.85GiB 0:00:00 [3.23GiB/s] [3.23GiB/s] [=========>] 100%
( pvE 0.1 in0 < "$__" | mawk2 ; )
0.71s user 1.46s system 88% cpu 2.456 total
45f50e894dae5cefcf3acc47fc402219 stdin