1

What does "CHAR(1)" in a UTF-8 encoded DB2 database mean?

Can I insert a special character (e.g. one that takes 2 octets in UTF-8) into a column of CHAR(1)?

Or does CHAR(1) in UTF-8 always mean, that it has capacity for one byte / octet, i.e. such that inserting an Umlaut into it will fail ?

I read through this interesting developerWorks article, but it's going to deep for my simple question...

Bastl
  • 2,926
  • 5
  • 27
  • 48

1 Answers1

5

It depends. :)

DB2 introduced code units to help with designing string-typed columns that are based on number of characters and not number of bytes. The CREATE TABLE statement has an overview of data types and also explains CHAR and VARCHAR. If number of characters is used, DB2 is assuming the worst case - 4 bytes/octets per character - for length computations.

The database configuration string_units determines whether by default the number of characters (CODEUNITS32) or the number of bytes (SYSTEM) is considered.

Coming back to your question: If you did not specify anything, inserting a special character that needs 2 octets into a CHAR(1) will likely fail. If CODEUNITS32 was specified, then it will succeed.

data_henrik
  • 16,724
  • 2
  • 28
  • 49
  • Thanks! That seems to answer the question. Can you please add information when this has been introduced? In the documentation code units appear with v10.5.0 ... – Bastl Jan 30 '17 at 12:07
  • Yes, that is correct. You can use the drop-down in the docs to switch to other versions. `string_units` is only available for 10.5 and up. – data_henrik Jan 30 '17 at 12:17
  • And just out of curiosity: How was this done before? Fixing the codepage at CREATE TABLE such that 8bit were always enough ? – Bastl Jan 30 '17 at 12:19
  • No, UTF-8 and other Unicode codepages are supported since a long time. UTF-8 is the default codepage since IMHO V9.5. Globalization / i18n requires some thoughts and good planning... – data_henrik Jan 30 '17 at 12:27