0

I read that so many old postings about converting nchar, nvarchar to some postgres data type. SQL server uses UTF-16 and unicode for nchar and nvarchar and according to:

How can I store UTF-16 characters in a Postgres database?

the conversion is risky business. While creating a psql table the character set can be specified with 'UTF-8' so this could be an approximated solution. However, for latest versions of Postgres:

Is Postgres now supporting UTF-16 or something similar?

If this is happening, from which psql version?

Thanks in advance

Jose Cabrera Zuniga
  • 2,348
  • 3
  • 31
  • 56
  • 2
    UTF-8 can support all characters that UTF-16 can, so there is no real need for UTF-16 (which also typically needs more storage than UTF-16) –  Feb 21 '22 at 20:01
  • It is unclear why the encoding servers use *internally* should matter to you. In any sane setup, you are never going to be copying raw bytes retrieved from one server to another, since values are normally typed. That is, you retrieve those strings as strings, in whatever encoding your client uses, and you send them as strings. This only becomes an issue in "sloppy" languages that don't have first-class support for encodings and instead treat strings as byte sequences that *you* are supposed to interpret at every access, but that's separate from what database engines are doing. – Jeroen Mostert Feb 21 '22 at 21:21
  • SQL Server has recently started to support storing strings internally as UTF-8, but the primary motivation there was to have less friction on Linux, where UTF-16 is a relatively uncommon encoding and not used inside the kernel (unlike Windows), so that the overhead of converting back and forth to UTF-16 every time is hard to justify. If you did have to shuttle strings as byte sequences without ever interpreting them, that would be probably be a more valuable avenue to explore than making Postgres store UTF-16. – Jeroen Mostert Feb 21 '22 at 21:24
  • Niggle... by default the string handling functions in SQL Server assume that `nchar`, `nvarchar` and `ntext` types store UCS-2 character data, which is not the same as UTF-16 encoding. That doesn't stop applications from storing and retrieving UTF-16 data in those columns, but it will cause unexpected behavior from the various string handling functions when code points outside the Basic Multilingual Plane are encountered. When an SQL Server database uses an *_SC (Supplementary Characters) collation the string functions get swapped out to be UTF-16 compatible. – AlwaysLearning Feb 21 '22 at 22:04
  • Microsoft provides a summary of the UCS-2/UTF-16 behaviour differences in most of the string handling functions at [Collation and Unicode support - Supplementary characters](https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15#Supplementary_Characters) – AlwaysLearning Feb 21 '22 at 22:07

0 Answers0