0

I try to swap encoding from UTF-8 to windows-1251, but all my solutions works only with latin letters. So I want to change encoding in cyrillic String. How can I do it correctly?

All solutions with creating a new String from bytes don't save Cyrillic letters.

For ex: UTF-8 - Some текст с кириллицей and latin windows-1251 - Some текст СЃ кириллицей and latin

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Mark Krass
  • 11
  • 1
  • 1
    Please show your code. – Codo Jan 31 '23 at 07:31
  • Is this all the code? I can't see anything related UTF-8. – Codo Jan 31 '23 at 08:15
  • And please add the code to the question and not as a comment. – Codo Jan 31 '23 at 08:17
  • 1
    [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – Seelenvirtuose Jan 31 '23 at 08:19
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jan 31 '23 at 09:19
  • 1
    @Codo I must assume that the [comment](https://stackoverflow.com/questions/75293780/save-cyrillic-while-change-string-encoding-from-utf-8-to-windows-1251-in-java#comment132861745_75293780) above was as reply to my comment (despite missing the `@user` annotation) I am not the author of the question, I posted a suggestion for the OP (I did not use `@Codo` in my first comment, should be implicit that it is addressed to OP [even if the site notified you]) And sure it is not all code, but the *heart* of how it can be done. – user16320675 Jan 31 '23 at 17:27

1 Answers1

1

Specify character encoding for writing

You can specify a character encoding with the CharSet class.

The NIO.2 framework in modern Java makes easy work of writing text to a file. For example, Files.writeString.

This code works for me:

String original = "Some текст с кириллицей";
byte[] win1251Bytes = new byte[ 0 ];
try { win1251Bytes = original.getBytes( "windows-1251" ); } catch ( UnsupportedEncodingException e ) { throw new RuntimeException( e ); }
Path path = Paths.get( "/Users/whatever/bogus.txt" );  // Home folder on macOS.
try { Files.write( path , win1251Bytes ); } catch ( IOException e ) { throw new RuntimeException( e ); }

Or, this briefer code works too, per the Comment by Holger below.

try
{
    Files.writeString(
            Paths.get( "/Users/whatever/bogus.txt" ) ,
            "Some текст с кириллицей" ,
            Charset.forName( "windows-1251" ) 
    );
}
catch ( IOException e )
{
    throw new RuntimeException( e );
}

I know nothing about Cyrillic text. I just read the Oracle tutorial first. Then I read Writing byte[] to a File in Java page at Baeldung.com. And in the Javadoc for Charset, I found a mention that if a character set is supported in Java, we should be able to use its name as listed in IANA Charset Registry. By following that link, I found the name "windows-1251".

Run that code to create the file.

Specify character encoding for reading

Open the file in a text editor of your choice. Be sure to tell the app to interpret the octets in the file as Windows-1251 encoding.

Here I chose to use the TextEdit app by Apple, bundled with macOS. In the File > Open dialog box for TextEdit, notice the Options button used to display a list of character encodings. Choose Cyrillic (Windows) there, as that seems to mean Windows-1251.

screenshot of TextEdit.app File > Open > Options dialog box

If the text is properly interpreted, we see the original Cyrillic characters.

enter image description here

Defaults

Be aware that until Java 17 and earlier, for most purposes the Java runtime defaults to the character encoding native to the host OS. This default applies to writing and reading text files, among other things.

As of Java 18 and later, the Java runtime defaults to UTF-8 character encoding for most purposes. This default applies across all host platforms (macOS, Linux, Windows, etc.). See JEP 400: UTF-8 by Default.

So when you need an alternate character encoding such as Windows 1251, always specify the CharSet explicitly.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • 2
    You can use the method for writing a string in the first place, `Files.writeString(path, original, Charset.forName("windows-1251"));` (since JDK 11) or `Files.write(path, Collections.singleton(original), Charset.forName("windows-1251"));` (since JDK 7) – Holger Feb 01 '23 at 16:31