What is the difference between UTF-32 and UCS-4?

Question

What is the difference between UTF-32 and UCS-4 ? Isn't UTF-32 supposed to be a fixed-width encoding ?

This question is a useful addition to this site: since it is a programming question, and is not yet answered on this site. Telling someone to 'Google it' is not a valid answer to any question ever - and has no place on Stackoverflow. — Ian Boyd, Apr 24 '22 at 18:49

score 23 · Answer 1 · edited Feb 24 '20 at 19:04

23

The Unicode Standard Version 8.0, Appendix C states:

UCS-4 stands for “Universal Character Set coded in 4 octets.” It is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in ISO 10646 (Universal Coded Character Set).

edited Feb 24 '20 at 19:04

Jim U

3,318
1
14
24

answered Jun 09 '16 at 08:02

Jonathan Maddox

331
2
3

score 21 · Accepted Answer · edited Aug 06 '18 at 16:52

21

UTF-32 has started as a subset of UCS-4. Now it is identical except that the UTF-32 standard has additional Unicode semantics. See details on wikipedia:

The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

Because only 17 planes are actually in use, all current code points are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only this range. Since the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, UTF-32 will be able to represent all Unicode characters. Accordingly, UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.

However, I am not exactly sure, what additional Unicode semantics means. Maybe someone can provide a better answer.

edited Aug 06 '18 at 16:52

BenMorel

34,448
50
182
322

answered May 12 '15 at 09:27

Christian Gollhardt

16,510
17
74
111

I personaly don't know @一二三. Maybe we need a better answer, which has more information about this. – Christian Gollhardt Apr 20 '16 at 02:48
1

The Wikipedia article says "[clarification needed]". – Keith Thompson Apr 20 '16 at 02:54
5

Sounds to me like UCS-4 = [0,0x7FFFFFFF] while UTF-32 = [0,0x10FFFF]. Both are represented as 32 bits, but UTF-32 further restricts the range of legal values. – Bill Fraser Oct 28 '16 at 23:13
1

UTF contains additional properties such as right to left etc. https://en.wikipedia.org/wiki/Unicode_character_property. Otherwise the two are the same. – Ian Apr 23 '19 at 06:37
See http://www.unicode.org/faq/utf_bom.html#utf32-1: “UTF-32 is a subset of the encoding mechanism called UCS-4 in ISO 10646.” – hermannk Oct 04 '20 at 09:50
2

"Additional Unicode semantics" means the extra properties Unicode adds above and beyond the code points, such as bidirectionality, collation rules, normalization of forms, etc. Some features of Unicode have been implemented in software that technically only supports UCS, but those are extensions that partly implement Unicode. – deriamis Dec 13 '22 at 22:28

What is the difference between UTF-32 and UCS-4?

2 Answers2