-1

I just stumbled across a strange thing while coding in Java:

I read a file into a bytearray (byte[] file_bytes) and what I want is a hexdump output (like the utilities hexdump or xxd in Linux). Basically this works (see the for-loop-code that is not commented out), but for larger Files (>100 KiB) it takes a bit, to go through the bytearray-chunks, do proper formatting, and so on.

But if I swap the for-loop-code with the code that is commented out (using a class with the same for-loop-code for calculation!), it works very fast.

What is the reason for this behavior?

Codesnippet:

    [...]

    long counter = 1;
    int chunk_size = 512;
    int chunk_count = (int) Math.ceil((double) file_bytes.length / chunk_size);
    for (int i = 0; i < chunk_count; i++) {
        byte[] chunk = Arrays.copyOfRange(file_bytes, i * chunk_size, (i + 1) * chunk_size);

       // this commented two lines calculate way more faster than the for loop below, even though the calculation algorithm is the same!
       /* 
        * String test = new BytesToHexstring(chunk).getHexstring();
        * hex_string = hex_string.concat(test);
        */ 

        for (byte b : chunk) {
            if( (counter % 4) != 0 ){
                hex_string = hex_string.concat(String.format("%02X ", b));
            } else{
                hex_string = hex_string.concat(String.format("%02X\n", b)); 
            }
            counter++;
        }
    }

    [...]

class BytesToHexstring:

class BytesToHexstring {
    private String m_hexstring;

    public BytesToHexstringTask(byte[] ba) {
        m_hexstring = "";
        m_hexstring = bytes_to_hex_string(ba);
    }

    private String bytes_to_hex_string(byte[] ba) {
        String hexstring = "";
        int counter = 1;

        // same calculation algorithm like in the codesnippet above!
        for (byte b : ba) {
            if ((counter % 4) != 0) {
                hexstring = hexstring.concat(String.format("%02X ", b));
            } else {
                hexstring = hexstring.concat(String.format("%02X\n", b));
            }
            counter++;
        }
        return hexstring;
    }

    public String getHexstring() {
        return m_hexstring;
    }

}

String hex_string:

00 11 22 33
44 55 66 77
88 99 AA BB
CC DD EE FF

Benchmarks:

  1. file_bytes.length = 102400 Bytes = 100 KiB

    • via class: ~0,7 sec
    • without class: ~5,2 sec
  2. file_bytes.length = 256000 Bytes = 250 KiB

    • via class: ~1,2 sec
    • without class: ~36 sec
user3469811
  • 676
  • 1
  • 5
  • 17

1 Answers1

2

There's an important difference between the two options. In the slow version, you are concatenating each iteration onto the entire hex string you built up for each byte. String concatenation is a slow operation since it requires copying the entire string. As you string gets larger this copying takes longer and you copy the whole thing every byte.

In the faster version you are building each chunk up individually and only concatenating whole chunks with the output string rather than each individual bytes. This mean much fewer expensive concatenations. You are still using concatenation while building uyp the chunk, but because a chunk is much smaller than the whole output those concatenations are faster.

You could do much better though by using a string builder instead of string concatenation. StringBuilder is a class designed for efficiently building up strings incrementally. It avoids the full copy on every append that concatenation does. I expect that if you remake this to use StringBuilder both versions would perform about the same, and be faster than either version you already have.

puhlen
  • 8,400
  • 1
  • 16
  • 31