I have the feeling this is most likely a duplicate, but I'm unable to find it.
NOTE: My Python knowledge is very limited, so I'm not 100% sure how strings, bytes, and encodings are done in Python. My knowledge about encodings in general is also not too great..
Let's say we have the string "Aä$$€h"
. It contains three different ordinary ASCII characters (A$h
), and two non-ASCII characters (ä€
). In Python we have the following code:
# coding: utf-8
input = u'Aä$$€h'
print [ord(c) for c in input.encode('utf-8')]
# Grouped per character:
print [[ord(x) for x in c.encode('utf-8')] for c in input_code]
Which will output:
[65, 195, 164, 36, 36, 226, 130, 172, 104]
[[65], [195, 164], [36], [36], [226, 130, 172], [104]]
Now I'm looking for a Java equivalent giving this same integer-array. I know all Strings in Java are by default encoded with UTF-16, and only byte-arrays can have an actual encoding. I thought the following code would give the result I expected:
String input = "Aä$$€h";
byte[] byteArray = input.getBytes(java.nio.charset.StandardCharsets.UTF_8);
System.out.println(java.util.Arrays.toString(byteArray));
But unfortunately it gives the following result instead:
[65, -61, -92, 36, 36, -30, -126, -84, 104]
I'm not sure where these negative values are coming from..
So my question is mostly this:
Given a String in Java containing non-ASCII characters (i.e. "Aä$$€h"
), output its ordinal UTF-8 integers similar as the Python ord
-function does on an UTF-8 encoded byte. The first part of this question, in that we already have a Java String, is a precondition for this question.