More exactly, the question is:
Which recipes are there to enable bash
scripts to properly and safely process N
bytes which might contain NUL
?
This question led to following observation:
bash -c 'LC_ALL=C read -rN 1 </dev/zero'
- Tested with Debian 10's
bash
version5.0.17(1)-release
(I tried to find out myself but found no pointer why this happens). All I found out so far is, that "my" bash
apparently skips all NUL
bytes on read -N
.
A possible workaround in the special case with -N 1
is to use
LC_ALL=C IFS= read -rd '' -n 1
such that NUL
acts as delimiter, so read
returns. But this trick fails in case you want to skip over more than 1 byte, as then the read
terminates after the first NUL
seen.
For special cases there are workarounds, like forking off dd
, but if you want to process the data in bash
or need to often skip just a few bytes, forking hurts more than it helps.
Also looping over read -d '' -n 1
is cumbersome if you want to skip over bigger NUL
areas, because this is one syscall per byte.
Notes:
- This is not a question about opinions which solution is best.
- This is a question to list ways to handle the most common cases.
- And the answers should be applicable to use cases like:
- Pipes, where you cannot seek
- Sockets (like
<>"/dev/tcp/$HOST/$PORT"
)
Please always keep in mind that "performance" includes more than just raw speed. It often includes the time you need to change something, where rewriting things from scratch takes too long, or plugging in something like dd
gets extremely difficult. Quite often all you have is just pure bash
. Plus some helpers.
For example there might be some bigger script which is applied to something like git fast-export
. This script works perfectly, until the first binary with a NUL
byte is added to the repo. Suddenly read -N
goes out of sync, such that git fast-import
complains. If the code is used mainly to edit commit messages (which are treated like the binary data) you have to duplicat your code: One for binary, NUL aware, one for commits, to change in bash.
Probably here is no such thing like one size fits all, so we likely need more solutions than to just call dd
.