From locale to ansi codepage to java charset?

Question

is there a way to get a java.nio.charset.Charset from an ANSI CODEPAGE and the ansi codepage from a locale? For example, if i have the locale "en_US" i want to have the charset "cp1252", so i can call

private final Charset CS1252 = Charset.forName("cp1252");

or when i have the locale "ja_JP" for japanese, i wanna get the corresponding charset, like

private final Charset CS932 = Charset.forName("ms932");

How can i achieve that in java? So what i need is a Method like getCharsetForLocale(java.util.Locale loc)

It seems the answer does not answer the question. Actually in apple JDK6, we can get the charset for a language and country, aka locale. If we change system language to simplified chinese, the default charset returns GB2312, while if you change the setting to tranditional chinese (HK), default charset is Big5. The default charset just return system file.encoding value, how JVM initializes it is unknown, as initialization of system props is in native code, I don not know how to check it. — xiaohei, Oct 23 '19 at 03:38

score 4 · Accepted Answer · answered May 22 '12 at 08:58

You can't and it does not make sense. Actually, any language could be written with several different character encodings, for example English could be written with: ASCII, ISO8859-1, ISO-8859-15, Windows 1252, UTF-7, UTF-8, UTF-16, UTF-32 and many, many more, basically with all the Windows code pages for example.

I am not sure what you are looking for, so let me suggest this:

If you are looking to save the data, use UTF-8 regardless of Locale. Always. Yes, always. Don't worry about the space, for many languages it is efficient enough and the disk space is cheap.
If you are want to know what kind of character encoding users might use, it is not valid to think they are restricted to a single one. Instead you may think of detecting the encoding using ICU Charset Detector for example (read more about detection here).
If you want to know the current code page of the system, the easiest way to do that (and it is OS independent!) is to call Charset.defaultCharset().

Next time, please try to describe your problem first, what you want to achieve and what you have already tried.

thanks, i looked further into the problem, and i need to find the ANSI-CODEPAGE for a locale name. And for this ansi-codepage i need a charset-object in java. — Christian Schiepe, May 22 '12 at 10:05
@Christian: If it was .Net, it would be quite easy. Unfortunately, there is no direct of CultureInfo in Java, therefore you simply need to map this "by hand". — Paweł Dyda, May 22 '12 at 11:58
Python (sitting on top of C/ POSIX concepts and underlying libraries) has a concept of a "default encoding for a locale". The equivalent here would be "default Charset for Locale". As much as standardizing on UTF-8 is great, the idea that certain locales likely require certain encodings, and this information may be available programmatically, is not unreasonable. — Adam Burke, Apr 04 '19 at 02:19

score 0 · Answer 2 · answered May 22 '12 at 07:23

0

I think you are trying to retrieve the Canonical Name of the Charset which can be obtained through the name() method of the class Charset.

answered May 22 '12 at 07:23

Korhan Ozturk

11,148
6
36
49

what i have is only a locale like "en_US" and NO charset yet! ok, what i could do is iterate all available charsets and try to match my locale with the canonical name and when i have a match, i have also my charset. but this seems not to be the best solution. – Christian Schiepe May 22 '12 at 07:28

score 0 · Answer 3 · answered May 22 '12 at 07:41

AFAIK, there is no intrinsic connection between locale and charset. Which charset do you expect for example for locale en_US? ASCII/CP1252/MacRoman/ISO-8859-1/UTF-8/UTF-16?

And for Japanese, you could at least use one of Shift JIS, CP932, EUC-JP, ISO-2022-JP, or UTF-8.

From locale to ansi codepage to java charset?

3 Answers3