7

We are installing ordering a configured oracle database and they are asking us what character encoding we would like to have. The application (in Java) is in English only but users are from different parts of the world.

Are there any motivations for NOT using UTF8 or other unicode character set?

5 Answers5

9

But watch out :

Do not use the character set named UTF8 as the database character set unless required for compatibility with Oracle Database clients and servers in version 8.1.7 and earlier, or unless explicitly requested by your application vendor. Despite having a very similar name, UTF8 is not a proper implementation of the Unicode encoding UTF-8. If the UTF8 character set is used where UTF-8 processing is expected, data loss and security issues may occur. This is especially true for Web related data, such as XML and URL addresses.

Oracle recommends AL32UTF8 as the database character set. AL32UTF8 is Oracle's name for the UTF-8 encoding of the Unicode standard.

  • Thank you very much.. fortunatly AL32UTF8 was what they proposed.. :-) –  Oct 13 '09 at 12:47
6

You should have two choices to make :

  1. Choose your database character set (used by VARCHAR2, CHAR, CLOB datatypes).
  2. Choose your national character set (used by NVARCHAR2, NCHAR, NCLOB datatypes).

As seen here :

Oracle recommends using Unicode for all new system deployments.

National character sets can only be Unicode : UTF-8 or UTF-16. So choosing the same character set for both would be redundant...

My advice (you say your application is in English only) :

  • Ask for your database character set to be UTF-8.
  • Ask for your national character set to be UTF-16.

And here is my general advice for your schema definition. Table by table, column by column (I take the VARCHAR2/NVARCHAR2 sample here) :

  • if your column could contain any character in the world (as in user input), make it NVARCHAR2.
  • if you have control about what is going to be stored (English then), make it VARCHAR2.
Mac
  • 318
  • 2
  • 3
  • 10
  • I'll add more links as soon as I can get access to the Oracle docs (site is down for now). – Mac Oct 09 '09 at 15:10
  • Oracle site is up, and reading the documentation made me slightly change my answer... – Mac Oct 12 '09 at 10:08
2

Are there any motivations for NOT using UTF8 or other unicode character set?

Just the one; you have an existing dataset of which you can't guarantee the current charset encoding.

In which case you probably want to remedy that and still use UTF8.

Dan Carley
  • 25,617
  • 5
  • 53
  • 70
1

No, not at all.

Jan Jungnickel
  • 964
  • 6
  • 9
0

Half-a joke: Yes, you can't connect anymore with old clients that don't know UTF.

slovon
  • 957
  • 5
  • 12