3

I need a function on Java that do the same as ASC function on Visual Basic. I've had looking for it on internet, but I can't found the solution.

The String that I have to know the codes was created on Visual Basic. It's according to ISO 8859-1 and Microsoft Windows Latin-1 characters. The ASC function on Visual Basic knows those codes, but in Java, I can't find a function that does the same thing.

I know in Java this sentence:

String myString = "ÅÛ–ßÕÅÝ•ÞÃ";
int first = (int)string.chartAt(0); // "Å"- VB and Java returns: 197
int second = (int)string.chartAt(0); // "Û" - VB and Java returns: 219
int third = (int)string.chartAt(0); // "–" - VB returns: 150 and Java returns: 8211

The first two characters, I haven't had problem, but the third character is not a ASCII code.

How can I get same codes in VB and Java?

1 Answers1

3

First of all, note that ISO 8859-1 != Windows Latin-1. (See http://en.wikipedia.org/wiki/Windows-1252)

The problem is that Java encodes characters as UTF16, so casting to int will generally result in the Unicode value of the char.

To get the Latin-1 encoding of a char, first convert it to a Latin-1 encoded byte array:

public class Encoding {

    public static void main(String[] args) {
        // Cp1252 is Windows codepage 1252
        byte[] bytes = "ÅÛ–ßÕÅÝ•ÞÃ".getBytes(Charset.forName("Cp1252"));
        for (byte b: bytes) {
            System.out.println(b & 255);
        }
    }

}

prints:

197
219
150
223
213
197
221
149
222
195
Adrian Leonhard
  • 7,040
  • 2
  • 24
  • 38
  • It works fine! Thank you very much. Your explanation was excellent for understand the problem and solution. – Marcelo Gonzaga Silva Feb 21 '15 at 22:03
  • 1
    It helps to know that VB6's `Asc()` function is a slow and obsolete feature included for backward compatibility that was replaced by `AscW()`. The old `Asc()` first converts to ANSI too. But the code given above in Java is not equivalent and will break down when the current codepage is something else, where VB6's `Asc()` always uses the current codepage instead of a hard-coded codepage. I have no clue why the code above ANDs byte values with 255, it looks like cargo-culting. – Bob77 Feb 21 '15 at 22:58
  • 1
    @Bob77 println(byte) will output negative values for values above 127. – Adrian Leonhard Feb 22 '15 at 00:51
  • Ahh, my mistake. Shows that I was not paying attention! – Bob77 Feb 22 '15 at 09:07