5

I want to use an ICU system-insensitive sorting collation, to avoid sorting differences between postgres11-on-mac vs postgres11-on-Ubuntu. My first test was to dump out my existing Collate=en_US.UTF-8 and pg_restore them into a db created with Collate=en-US-x-icu

Create Database doc has this to say:

To create a database music with a different locale:

CREATE DATABASE music LC_COLLATE 'sv_SE.utf8' LC_CTYPE 'sv_SE.utf8' TEMPLATE template0;

I seem to have the required icu locales already:

select collname, collprovider from pg_collation where collname like 'en_US%';

        collname        | collprovider
------------------------+--------------
 en_US.UTF-8            | c
 en_US                  | c
 en_US.ISO8859-15       | c
 en_US.ISO8859-1        | c
 en_US                  | c
 en_US                  | c
 en-US-x-icu            | i 
 en-US-u-va-posix-x-icu | i 
(8 rows)

But no luck when creating a database with either icu locales.

CREATE DATABASE test LC_COLLATE = 'en-US-x-icu' TEMPLATE template0;
ksysdb=# CREATE DATABASE test LC_COLLATE = 'en-US-x-icu' TEMPLATE template0;
ERROR:  invalid locale name: "en-US-x-icu"

I can use LC_COLLATE with other locales:

The LC_COLLATE clause does seem to come with some strings attached, such as watching your encoding and specifying an appropriate template. But it seems to give error hints w non-ICU locales.

This works, for example: CREATE DATABASE test LC_COLLATE = 'en_US' TEMPLATE template0;

and this one gives a helpful user message:

ksysdb=# CREATE DATABASE test LC_COLLATE = 'en_US.ISO8859-15' TEMPLATE template0;
ERROR:  encoding "UTF8" does not match locale "en_US.ISO8859-15"
DETAIL:  The chosen LC_COLLATE setting requires encoding "LATIN9".

Note: a related question, PostgreSQL 10 on Linux - LC_COLLATE locale en_US.utf-8 not valid, doesn't seem all that relevant, as the answer talks about generating an OS-level locale to fix the issue. While the ICU locales, as far as I understand, are expressly intended to be separated from the underlying OS.

cachius
  • 1,743
  • 1
  • 8
  • 21
JL Peyret
  • 10,917
  • 2
  • 54
  • 73
  • 1
    Currently you can't use ICU locales as the default locale –  Apr 05 '20 at 20:15
  • so.... specify it via ALTER COLUMN only? or at CREATE TABLE time? And is this also true on postgresql12, even though the question is for 11 right now? – JL Peyret Apr 05 '20 at 20:16
  • Yes, that's correct –  Apr 05 '20 at 20:16
  • 1
    if you write it up and link to some postgresql doc, so that people can see when/if the restriction is lifted, I'll accept. – JL Peyret Apr 05 '20 at 20:17
  • 1
    The only documentation is the source: `check_locale` in `src/backend/utils/adt/pg_locale.c` uses `setlocale` to check if the locale name is valid. That is a C library function, consequently it will only work with C library collations. – Laurenz Albe Apr 06 '20 at 06:40
  • This seems to be a feature of the commercial 'EDB Postgres Advanced Server'. It seems you can set default ICU collations per database like `CREATE DATABASE collation_db TEMPLATE template0 ENCODING 'UTF8' ICU_SHORT_FORM = 'AN_CU_EX_NX_LROOT';` and even per cluster via `initdb --icu-short-form`. Nice examples in the [old docs](https://www.enterprisedb.com/edb-docs/d/edb-postgres-advanced-server/user-guides/user-guide/9.6/EDB_Postgres_Advanced_Server_Guide.1.24.html#pID0E0TCD0HA). – cachius Nov 26 '20 at 16:04
  • Over at https://stackoverflow.com/questions/59168841 a nice conversion script is given to convert all text columns to icu collations. – cachius Nov 26 '20 at 16:09
  • Is this still true for version 15? – dipold Feb 16 '23 at 16:33

1 Answers1

1

This is possible from PostgreSQL v15 on:

CREATE DATABASE test
   LOCALE_PROVIDER icu
   ICU_LOCALE "en-US"
   LOCALE "en_US.utf8"
   TEMPLATE template0;

You still need to provide the libc locale, but the ICU library is used for collations.

From PostgreSQL v16 on, ICU will be the default locale.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263