10

I'm attempting to move a SQL Server DB which is used by a C# application (+EF6) to Postgres 12 but I'm not having much luck with getting case-insensitive string comparisons working. The existing SQL Server db uses SQL_Latin1_General_CP1_CI_AS collation which means all WHERE clauses don't have to worry about case.

I understand that CIText was the way to do this previously, but is now superseded by non-deterministic collations.

I created such a collation;

CREATE COLLATION ci (provider = icu, locale = 'und-u-ks-level2', deterministic = false);

and when this is applied to the CREATE TABLE on a per-column basis it does work - case is ignored.

CREATE TABLE casetest (
id serial NOT NULL,
code varchar(10) null COLLATE "ci",
CONSTRAINT "PK_id" PRIMARY KEY ("id"));

But from what I have read it must be applied to every varchar column and can't be set globally across the whole db.

Is this correct?

I don't want to use .ToLower() everywhere due to clutter and that any index on the column is then not used.

I tried modifying the pre-existing 'default' collation in pg_collation to match the settings of 'ci' collation but it has no effect.

Thanks in advance. PG

cachius
  • 1,743
  • 1
  • 8
  • 21
Ziloni
  • 101
  • 1
  • 6

1 Answers1

8

You got it right. From PostgreSQL v15 on, ICU collations can be used as database collations, but only deterministic ones (that don't compare different strings as equal). So your case-insensitive collation wouldn't work there either. Since you are using v12, you cannot use ICU collations as database default collation at all, but have to use them in column definitions.

This limitation is annoying and not in the nature of things. It will probably be lifted in some future version.

You can use a DO statement to change the collation of all string columns:

DO
$$DECLARE
   v_table  regclass;
   v_column name;
   v_type   oid;
   v_typmod integer;
BEGIN
   FOR v_table, v_column, v_type, v_typmod IN
      SELECT a.attrelid::regclass,
             a.attname,
             a.atttypid,
             a.atttypmod
      FROM pg_attribute AS a
         JOIN pg_class AS c ON a.attrelid = c.oid
      WHERE a.atttypid IN (25, 1042, 1043)
        AND c.relnamespace::regnamespace::name
            NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
   LOOP
      EXECUTE
         format('ALTER TABLE %s ALTER %I SET DATA TYPE %s COLLATE ci',
                v_table,
                v_column,
                format_type(v_type, v_typmod)
         );
   END LOOP;
END;$$;
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • At the PostgreSQL Commitfest 2019-11 this [feature](https://commitfest.postgresql.org/25/2256/) was 'returned with feedback'. So I guess it won't come. – cachius Nov 26 '20 at 15:07
  • @ReinState Another patch for that may make it in the future; people are aware that it is a shortcoming. – Laurenz Albe Nov 26 '20 at 15:12
  • 1
    This seems to be a feature of the commercial 'EDB Postgres Advanced Server'. You can set default ICU collations per database like `CREATE DATABASE collation_db TEMPLATE template0 ENCODING 'UTF8' ICU_SHORT_FORM = 'AN_CU_EX_NX_LROOT';`. Nice examples in the [old docs](https://www.enterprisedb.com/edb-docs/d/edb-postgres-advanced-server/user-guides/user-guide/9.6/EDB_Postgres_Advanced_Server_Guide.1.24.html)([Other versions and formats](https://www.enterprisedb.com/edb-docs/c/user-guides)). No mentions of nondeterministic collations though. – cachius Nov 26 '20 at 16:40
  • 2
    If you want queries to just work™ case-insensitive, you can use this script to convert column types to [citext](https://www.postgresql.org/docs/current/citext.html). It's an [additional supplied module](https://www.postgresql.org/docs/current/contrib.html#footer) that calls lower() before making queries. Import once via `CREATE EXTENSION citext;` and change the format line to `format('ALTER TABLE %s ALTER COLUMN %I SET DATA TYPE CITEXT'`. This has the benefit that `LIKE, ILIKE and ~*` [keep working](https://dba.stackexchange.com/questions/266097) and enables portable SQL via `LIKE` statements. – cachius Nov 27 '20 at 16:11
  • 1
    Is this still true for version 15? – dipold Feb 16 '23 at 16:37
  • 2
    @dipold In v15, you can use ICU collations as database collations, but not case insensitive ones. – Laurenz Albe Feb 16 '23 at 19:20