Is there a fast way to read alternate bytes in dd

Question

I'm trying to read out every other pair of bytes in a binary file using dd in a loop, but it is unusably slow.

I have a binary file on a BusyBox embedded device containing data in rgb565 format. Each pixel is 2 bytes and I'm trying to read out every other pixel to do very basic image scaling to reduce file size.

The overall size is 640x480 and I've been able to read every other "row" of pixels by looping dd with a 960 byte block size. But doing the same for every other "column" that remains by looping through with a 2 byte block size is ridiculously slow even on my local system.

i=1
while [[ $i -le 307200 ]]
do
        dd bs=2 skip=$((i-1)) seek=$((i-1)) count=1 if=./tmpfile >> ./outfile 2>/dev/null
        let i=i+2
done

While I get the output I expect, this method is unusable.

Is there some less obvious way to have dd quickly copy every other pair of bytes?

Sadly I don't have much control over what gets compiled in to BusyBox. I'm open to other possible methods but a dd/sh solution may be all I can use. For instance, one build has omitted head -c...

I appreciate all the feedback. I will check out each of the various suggestions and check back with results.

Why `seek=$((i-2))`? Why not simply `seek=2` to skip exactly two bytes each time? You shoudldn't have to increment `i` at all; by default `dd` picks up reading where it stopped. (Also, my personal preference would be to use `of=./outfile` rather than redirecting standard output.) — B. Shefter, Apr 24 '19 at 19:05
Do you know the maximum length of the arguments to a command (`argmax`) on your system please? Normally you can find it with `sysctl -a | grep -i arg` — Mark Setchell, Apr 25 '19 at 01:34
No perl, no python `# sysctl -a | grep -i arg sysctl: error reading key 'net.ipv4.route.flush': Permission denied` I can try the seek=2 - knowing it picks up from where it left off could certainly lead to far more optimized operation. Redirecting standard output is simply an example in this case. — anti_climax, Apr 25 '19 at 15:39
@B.Shefter - are you saying I don't need to loop at all, or simply don't need to set seek with an incremented value from the loop? — anti_climax, Apr 25 '19 at 16:51
@anti-climax I think you still loop (that is, you keep reading two bytes at a time), but you don’t need a counter—just keep reading until there’s nothing left to read. — B. Shefter, Apr 25 '19 at 18:09

Gilles 'SO- stop being evil' · Answer 1 · 2019-04-24T21:20:37.070

Skipping every other character is trivial for tools like sed or awk as long as you don't need to cope with newlines and null bytes. But Busybox's support for null bytes in sed and awk is poor enough that I don't think you can cope with them at all. It's possible to deal with newlines, but it's a giant pain because there are 16 different combinations to deal with depending on whether each position in a 4-byte block is a newline or not.

Since arbitrary binary data is a pain, let's translate to hexadecimal or octal! I'll draw some inspiration from bin2hex and hex2bin scripts by Stéphane Chazelas. Since we don't care about the intermediate format, I'll use octal, which is a lot simpler to deal with because the final step uses printf which only supports octal. Stéphane's hex2bin uses awk for the hexadecimal-to-octal conversion; a oct2bin can use sed. So in the end you need sh, od, sed and printf. I don't think you can avoid printf: it's critical to outputting null bytes. While od is essential, most of its options aren't, so it should be possible to tweak this code to support a very stripped-down od with a bit more postprocessing.

od -An -v -t o1 -w4 |
sed 's/^ \([0-7]*\) \([0-7]*\).*/printf \\\\\1\\\\\2/' |
sh

The reason this is so fast compared to your dd-based approach is that BusyBox runs printf in the parent process, whereas dd requires its own process. Forking is slow. If I remember correctly, there's a compilation option which makes BusyBox fork for all utilities. In this case my approach will probably be as slow as yours. Here's an intermediate approach using dd which can't avoid the forks, but at least avoids opening and closing the file every time. It should be a little faster than yours.

i=$(($(wc -c <"$1") / 4))
exec <"$1"
dd ibs=2 count=1 conv=notrunc 2>/dev/null
while [ $i -gt 1 ]; do
  dd ibs=2 count=1 skip=1 conv=notrunc 2>/dev/null
  i=$((i - 1))
done

For lack of a base64 conversion option I've actually done related manipulation to hex with the available tools. I'll have to investigate this line further. — anti_climax, Apr 25 '19 at 16:04
@anti_climax Hex is usually a little easier to work with than octal, but the conversion to/from binary is a lot easier with octal. Base64 (or uuencode, which may be available even if base64 isn't, I've had builds of BB with `uuencode` without the `-m` option) doesn't really help here because you want bytes grouped by 2 or 4, but base64 and uuencode group by 3. — Gilles 'SO- stop being evil', Apr 25 '19 at 16:49
I meant in the more general sense that I've manipulated binary data to other forms using the limited toolset for lack of a base64 encoding option, so I'm at least familiar with the overall mechanics if not the specifics needed here. — anti_climax, Apr 25 '19 at 17:08

Mark Setchell · Answer 2 · 2019-04-24T20:21:53.630

No idea if this will be faster or even possible with BusyBox, but it's a thought...

#!/bin/bash

# Empty result file
> result

exec 3< datafile
while true; do
    # Read 2 bytes into file "short"
    dd bs=2 count=1 <&3 > short 2> /dev/null
    [ ! -s short ] && break
    # Accumulate result file
    cat short >> result
    # Read two bytes and discard
    dd bs=2 count=1 <&3 > short 2> /dev/null
    [ ! -s short ] && break
done

Or this should be more efficient:

#!/bin/bash

exec 3< datafile
for ((i=0;i<76800;i++)) ; do
    # Skip 2 bytes then read 2 bytes
    dd bs=2 count=1 skip=1 <&3 2> /dev/null
done > result

Or, maybe you could use netcat or ssh to send the file to a sensible (more powerful) computer with proper tools to process it and return it. For example, if the remote computer had ImageMagick it could down-scale the image very simply.

The ultimate goal is to reduce the file size specifically to avoid transferring more data than is necessary over a slow metered connection. I've found that quarterscale images are still readable on top be getting further cropped to specific areas before being compressed and transferred out. — anti_climax, Apr 25 '19 at 15:48

Mark Setchell · Answer 3 · 2019-04-25T15:34:47.720

Another option might be to use Lua which has a reputation for being small, fast and well suited to embedded systems - see Lua website. There are pre-built, downloadable binaries of it there too. It is also suggested on the Busybox website here.

I have never written any Lua before, so there may be some inefficiencies but this seems to work pretty well and processes a 640x480 RGB565 image in a few milliseconds on my desktop.

-- scale.lua
-- Usage: lua scale.lua input.bin output.bin
-- Scale an image by skipping alternate lines and alternate columns

-- Set up width, height and bytes per pixel
w   = 640
h   = 480
bpp = 2    

-- Open first argument for input, second for output
inp = assert(io.open(arg[1], "rb"))
out = assert(io.open(arg[2], "wb"))

-- Read image, one line at a time
for i = 0, h-1, 1 do
   -- Read a whole line
   line = inp:read(w*bpp)

   -- Only use every second line
   if (i % 2) == 0 then
      io.write("DEBUG: Processing row: ",i,"\n")
      -- Build up new, reduced line by picking substrings
      reduced=""
      for p = 1, w*bpp, bpp*2 do
         reduced = reduced .. string.sub(line,p,p+bpp-1)
      end
      io.write("DEBUG: New line length in bytes: ",#reduced,"\n")
      out:write(reduced)
   end
end

assert(out:close())

I created a greyscale test image with ImageMagick as follows:

magick -depth 16 -size 640x480 gradient: gray:image.bin

Then I ran the above Lua script with:

lua scale.lua image.bin smaller.bin

Then I made a JPEG I could view for testing with:

magick -depth 16 -size 320x240 gray:smaller.bin smaller.jpg

I have Lua available but I'm not sure if it will be much faster. I used a published base64 conversion script - for lack of *that* functionality or real bitwise operators in shell - and it too was unusably slow. — anti_climax, Apr 25 '19 at 15:51
@anti_climax Could you clarify what the actual device is that you are running on please? Also, what is the disk? — Mark Setchell, Apr 25 '19 at 16:01
It's an embedded device running off flash with I believe an ARM single core processor. Think 10 year old wireless router running stripped down BusyBox on the other side of a dial-up connection. I'm absolutely going to give it a try, I just don't have high hopes due to the system itself. — anti_climax, Apr 25 '19 at 16:40
Maybe put a $5 Raspberry Pi Zero W with built-in wifi (running Raspbian OS which is like Debian) next to it, and send the image there fast and for free for JPEG or any other compression using **ImageMagick**. Or if your embedded device doesn't have wifi, use a Raspberry Pi 3 with wired Ethernet. — Mark Setchell, Apr 26 '19 at 10:36

Is there a fast way to read alternate bytes in dd

3 Answers3

Linked