0

I'm going insane trying to get this to work

Trying to duplicate the following java function

https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Hex.html#decodeHex-java.lang.String-

Here is the code from Java

public static byte[] decodeHex(final String data) throws DecoderException {
    return decodeHex(data.toCharArray());
}

public static byte[] decodeHex(final char[] data) throws DecoderException {
    final int len = data.length;
    if ((len & 0x01) != 0) {
        throw new DecoderException("Odd number of characters.");
    }
    final byte[] out = new byte[len >> 1];
    // two characters form the hex value.
    for (int i = 0, j = 0; j < len; i++) {
        int f = toDigit(data[j], j) << 4;
        j++;
        f = f | toDigit(data[j], j);
        j++;
        out[i] = (byte) (f & 0xFF);
    }
    return out;
}

protected static int toDigit(final char ch, final int index) throws DecoderException {
    final int digit = Character.digit(ch, 16);
    if (digit == -1) {
        throw new DecoderException("Illegal hexadecimal character " + ch + " at index " + index);
    }
    return digit;
}

given the following string 01e703000000000000 the byte array should contain

[1, -25, 3, 0, 0, 0, 0, 0, 0]

enter image description here

I thought this may be standard hex decoding so I used the usual decoding function I've been using

extension String {
    /// A data representation of the hexadecimal bytes in this string.
    var hexDecodedData: Data {
        // Get the UTF8 characters of this string
        let chars = Array(utf8)
        // Keep the bytes in an UInt8 array and later convert it to Data
        var bytes = [UInt8]()
        bytes.reserveCapacity(count / 2)
        // It is a lot faster to use a lookup map instead of stratal 
        let map: [UInt8] = [
            0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, // 01234567
            0x08, 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 89:;<=>?
            0x00, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x00, // @ABCDEFG
            0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00  // HIJKLMNO
        ]
        // Grab two characters at a time, map them and turn it into a byte
        for i in stride(from: 0, to: count, by: 2) {
            let index1 = Int(chars[i] & 0x1F ^ 0x10)
            let index2 = Int(chars[i + 1] & 0x1F ^ 0x10)
            bytes.append(map[index1] << 4 | map[index2])
        }
        return Data(bytes)
    }
}

This results in

[1, 231, 3, 0, 0, 0, 0, 0, 0]

So the I tried converting the java code to swift myself

extension String {
    // public static byte[] decodeHex(final char[] data) throws DecoderException
    func decodeHex() throws -> [Int] {
        let stringArray = Array(self)
        let len = count
        var out: [Int?] = Array(repeating: nil, count: len >> 1)
        if (len & 0x01) != 0 {
            throw HExDecodingError.oddNumberOfCharacters
        }
        var i = 0
        var j = 0
        while j < len {
            var f: Int = try Self.toDigit(char: String(stringArray[j]), index: j)
            j += 1
            f = f | (try Self.toDigit(char: String(stringArray[j]), index: j))
            j += 1
            out[i] = f & 0xFF
            i += 1
        }
        return out.compactMap { $0 }
    }
    enum HExDecodingError: Error {
        case oddNumberOfCharacters
        case illegalCharacter(String)
        case conversionToDogotFailed(String, Int)
    }
    static func toDigit(char: String, index: Int) throws -> Int {
        let digit = Int(char, radix: 16)
        if digit == -1 {
            throw HExDecodingError.illegalCharacter(char)
        }
        guard let  digit = digit else {
            throw HExDecodingError.conversionToDogotFailed(char, index)
        }
        return digit
    }
}

Which results in

[1, 15, 3, 0, 0, 0, 0, 0, 0]

What is going on? what am I doing wrong

EDIT: Also how can there possibly be a negative number in there since a byte array is represented as a [UInt8]

Leo Dabus
  • 229,809
  • 59
  • 489
  • 571
aryaxt
  • 76,198
  • 92
  • 293
  • 442
  • 4
    The result of your first code attempt looks correct: Java doesn’t have unsigned integers, but Swift does. Signed byte `-25` (Java `byte`, Swift `Int8`) has the same bit pattern as unsigned byte `231` (Swift `UInt8`). Are you looking for a different result? (If you need to match Java results _exactly_, then you should be using `Int8` instead of `UInt8`) – Itai Ferber Apr 10 '22 at 16:54
  • 1
    In your own translation of the Java function, it is incorrect to translate Java's `byte` to `Int`. Java's `byte` is signed and so should be represented as `Int8` in Swift. That should answer the question of "Also how can there possibly be a negative number in there since a byte array is represented as a `[UInt8]`". – Sweeper Apr 10 '22 at 17:14
  • @ItaiFerber is correct thank you, issue is resolved. Feel free to add this as an answer and I'll accept. Yes the code is posted on SO, I was looking for explanation of why bytes appeared different in Java vs Swift as I'm writing tests to match existing Java unit tests – aryaxt Apr 10 '22 at 17:24
  • @aryaxt updated the duplicated post to throw errors as well [How to convert hexadecimal string to an array of UInt8 bytes in Swift?](https://stackoverflow.com/a/43360864/2303865) – Leo Dabus Apr 10 '22 at 22:20
  • 1
    @aryaxt note that checking if `digit == -1` doesn't make any sense. Int radix is a fallible initializer `init?(_ text: S, radix: Int = 10) where S : StringProtocol`. It will fail instead of return `-1` in other words it would return `nil` – Leo Dabus Apr 10 '22 at 22:33
  • 1
    Note also that String Character type has an `hexDigitValue` instance property which returns an optional Int `let digit = char.hexDigitValue` – Leo Dabus Apr 10 '22 at 22:36
  • yeah I blindly copied it from java just to see the result, ended up keeping the initial implementation – aryaxt Apr 11 '22 at 04:42

1 Answers1

0

The Java byte primitive type

is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive).

(Java, in general, does not have unsigned primitive types, only signed ones.)

In your Java output, the -25 value corresponds to hex E7 in your string, whose decimal value is E * 16 + 7 = 14 * 16 + 7 = 231; 231 is outside of the [-128, 127] range, and wraps around to -25. (More precisely the bit pattern of unsigned 8-bit 231 corresponds to the bit pattern of signed 8-bit -25 in two's-complement.)

In Swift, you're using a UInt8 to represent results (both explicitly, and implicitly in a Data value), and the range of UInt8 is [0, 255]; 231 fits within this range, and is what you see in your first Swift code snippet. The results are bitwise equivalent, but if you need results which are type equivalent to what you're seeing in Java, you'll need to work in terms of Int8 instead of UInt8.

Itai Ferber
  • 28,308
  • 5
  • 77
  • 83