0

I have a question about this code:

    String st = "AaHello";
    byte[] bt = st.getBytes("UTF-8");
    for (int i = 0; i < bt.length; i++){
        System.out.println(bt[i]);
    }
  

I don't quite understand why I got the following output: 65 97 72 101 108 108 111

I think 65 is the ASCII CODE for A, 97 is the ASCII CODE for a, 72 is the ASCII CODE for H... so it seems that it is printing out ASCII code.

But shouldn't it print the byte in the UTF-8 encoding format? (hexa decimal or binary) for example shouldn't A be printed out as 41? and then a be printed out as 61? ....

Why is it that it is not printout out the UTF-8 bytes code?

On the other hand, from the example on the JAVA oracle tutorial site, the corresponding UTF-8 actually got printed out https://docs.oracle.com/javase/tutorial/i18n/text/string.html for example here in the example: utf8Bytes[0] = 0x41

I am just wondering is my code and the code in the example doing the same thing? (I used getBytes() method too just like in the example there, why is it that I am not printing out the byte code it seems, but the example in the java oracle site is printing out the byte code correctly)?

john_w
  • 693
  • 1
  • 6
  • 25
  • 4
    Did you missed the conversion that is done on the example you shared? There it is using `UnicodeFormatter.java` – Jorge Campos Aug 10 '21 at 20:06
  • 6
    It *is* printing the byte code. A byte is a numeric value, and the default *representation* of a numeric value is base 10. If you want to print it in hex you have to do so yourself :) The linked code does exactly that, in `printBytes`. – Dave Newton Aug 10 '21 at 20:07
  • 2
    `byte` is a numeric type, its default conversion to `String` is in decimal format, the same as other numeric types. – kaya3 Aug 10 '21 at 20:09
  • @DaveNewton Hello, so default representation means ASCII CODE? (so my code is printing out the correct bytecode, but it is just that it is in ASCII representation?) – john_w Aug 10 '21 at 20:14
  • ASCII is a character. The default representation of a _number_ is base 10. The ASCII _code_ for a character is a number. – Dave Newton Aug 10 '21 at 20:17
  • No, you take a string and it gets decoded to bytes, then you're printing the numeric value of the bytes. If you want to see a hex value you have to tell java to print the hex value of the number. As an aside UTF8 uses the same encoding as ascii for the first 128 characters. – matt Aug 10 '21 at 20:18
  • You seem to be a bit confused here. `0x41` is equal to `65`; they're the same number but in different bases (`0x41` is hexadecimal or base 16 whereas `65` is decimal or base 10). – Aplet123 Aug 10 '21 at 20:24
  • If you want the same output: `System.out.printf("0x%02X%n", bt[i]);` That will work for you in the 'ASCII range' – g00se Aug 10 '21 at 20:36
  • Thank you all for your answers and explanations. I think I got it now. – john_w Aug 10 '21 at 22:27

1 Answers1

1

65 decimal == 41 hexadecimal

As comments indicate, your output when printing a byte value is a number in base 10, decimal. That means 65 for the byte value for A, as the US-ASCII number assigned to that letter is sixty-five. (The Unicode code point is also 65, with Unicode being a superset of US-ASCII.)

The 0x41 seen in the Oracle Tutorial was text generated by the UnicodeFormatter class, using hexadecimal digits (base 16) rather than decimal (base 10). Look for the call UnicodeFormatter.byteToHex(array[k]) in that example code on tutorial page. That class was written for the tutorial, and is not bundled with a JDK. See source code.

If you want your code to print 0x41 for a byte with a decimal value of 65, either use that tutorial’s class source code if you can abide its license terms or write your own such class from scratch. Or find a solution on the Stack Overflow page, Java code To convert byte to Hexadecimal such as calling Integer.toHexString.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154