How can I store all UTF-16 "characters" in a Postgres database?
Short answer, this is not directly possible as PostgreSQL only supports a UTF-8 character set.
UTF-16 based formats like Java, JavaScript, Windows can contain half surrogate pairs which have no representation in UTF-8 or UTF-32. These may easily be created by sub-stringing a Java, JavaScript, VB.Net string. As they cannot be represented in UTF-8 or UTF-32 and thus cannot be stored in a database which only supports an UTF-8 character set like PostgreSQL.
Windows Path names may contain half surrogate pairs which cannot be read as utf-8 ( https://github.com/rust-lang/rust/issues/12056 ).
One would have to use database system which supports a UTF-16/CESU-8 character set which is more adapted to Java/Android, JavaScript/NodeJS, .Net/wchar_t/Windows languages/platforms.
(SQLServer, Oracle (UTF-8 collation), DB2, Informix, HANA, SQL Anywhere, MaxDB typically support such a charset.
Note that with emoticons being represented as unicode codepoints outside the Basic Multilingual Plane these differences will become more relevant also for western users.
On postgres you may:
a) Accept the losses,
b) Store the data as binary data
or
c) translate them to an
encoded representation (e.g. the JSON rfc encodes them as two escaped characters to be able to transport half surrogates within an UTF-8/Ascii based network format without loss (https://www.rfc-editor.org/rfc/rfc4627 Section 2.5).
With e.g. emoticons being located outside the Basic multilingual plane this problem will become more relevant also in the western world.
Depending on the pick of language Application Server ( Java,Scala, C#/Windows, JavaScript/NodeJS) vs go and the level of investment into language support (using e.g. ICU string splitting functions at grapheme boundaries (https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) instead of simple truncation the issue may be less relevant. But the majority of enterprise systems and languages fall in the UTF-16 camp today, with software using a simple sub-string operations.