Cyrillic text in Java

Question

I'm trying to transfer Russian text to an Excel or SQLite or to any other program. The result is always the same: РђР±РёСЃСЃРёРЅСЃРєР°СЏ РєРѕС€РєР°. I understand that something with the encoding.

I tried

String myString = "some cyrillic text";
byte bytes[] = type.getBytes("UTF-8");
String value = URLEncoder.encode(new String(bytes, "Windows-1251"), "Windows-1251");

But that doesn't help either.

If you want to see the foreign text , you must have both a font allocated to program that screens it and must have the charset encoding installed in your machine, too if it is UTF-8 then the national Locale must be set upon any components designed to store and render the foreign text in the application. — Samuel Marchant, Mar 05 '23 at 10:13
Unrelated to your question, but you really should not declare arrays as `byte bytes[]`, although valid syntax, it is considered a historic oddity, and the recommended syntax is `byte[] bytes`. — Mark Rotteveel, Mar 05 '23 at 11:07

Stephen C · Answer 1 · 2023-03-05T05:58:05.590

String myString = "some cyrillic text"; 
byte bytes[] = type.getBytes("UTF-8");

Now bytes contains a UTF-8 encoding of the string. If you were to call new String(bytes, "UTF-8") you would get back an equivalent string to the original one.

But ...

String value = URLEncoder.encode(
    new String(bytes, "Windows-1251"),  // HERE
    "Windows-1251");

... at HERE you are decoding with the wrong character encoding. The String constructor takes your word for it ... and the result is mangled characters.

Understand this:

The bytes array contains just the encoded text. It doesn't contain anything to identify the encoding scheme. So the String constructor has no way of knowing what the correct encoding is ... apart from what you tell it. And it has no (reliable) way of knowing if the encoding you told it is correct. Let alone fixing your mistake.

The correct way to do what your code does is this:

String myString = "some cyrillic text"; 
String value = URLEncoder.encode(myString, "Windows-1251");

However ... we don't have sufficient context to know whether that is what is actually required for your application.

Cyrillic text in Java

1 Answers1