Specify character encoding for writing
You can specify a character encoding with the CharSet
class.
The NIO.2 framework in modern Java makes easy work of writing text to a file. For example, Files.writeString
.
This code works for me:
String original = "Some текст с кириллицей";
byte[] win1251Bytes = new byte[ 0 ];
try { win1251Bytes = original.getBytes( "windows-1251" ); } catch ( UnsupportedEncodingException e ) { throw new RuntimeException( e ); }
Path path = Paths.get( "/Users/whatever/bogus.txt" ); // Home folder on macOS.
try { Files.write( path , win1251Bytes ); } catch ( IOException e ) { throw new RuntimeException( e ); }
Or, this briefer code works too, per the Comment by Holger below.
try
{
Files.writeString(
Paths.get( "/Users/whatever/bogus.txt" ) ,
"Some текст с кириллицей" ,
Charset.forName( "windows-1251" )
);
}
catch ( IOException e )
{
throw new RuntimeException( e );
}
I know nothing about Cyrillic text. I just read the Oracle tutorial first. Then I read Writing byte[] to a File in Java page at Baeldung.com. And in the Javadoc for Charset
, I found a mention that if a character set is supported in Java, we should be able to use its name as listed in IANA Charset Registry. By following that link, I found the name "windows-1251"
.
Run that code to create the file.
Specify character encoding for reading
Open the file in a text editor of your choice. Be sure to tell the app to interpret the octets in the file as Windows-1251 encoding.
Here I chose to use the TextEdit app by Apple, bundled with macOS. In the File > Open dialog box for TextEdit, notice the Options
button used to display a list of character encodings. Choose Cyrillic (Windows) there, as that seems to mean Windows-1251.

If the text is properly interpreted, we see the original Cyrillic characters.

Defaults
Be aware that until Java 17 and earlier, for most purposes the Java runtime defaults to the character encoding native to the host OS. This default applies to writing and reading text files, among other things.
As of Java 18 and later, the Java runtime defaults to UTF-8 character encoding for most purposes. This default applies across all host platforms (macOS, Linux, Windows, etc.). See JEP 400: UTF-8 by Default.
So when you need an alternate character encoding such as Windows 1251, always specify the CharSet
explicitly.