2

Within our postgres 12 database, using united_states.utf8 collation, we have a dataset with Spanish data that includes accented characters. However when we upper() the values in the field, unaccented characters are correctly uppercased, but accented character are not.

upper('anuncio genérico para web del cliente') gives 'ANUNCIO GENéRICO PARA WEB DEL CLIENTE'

How can I correct this to get the expected result of 'ANUNCIO GENÉRICO PARA WEB DEL CLIENTE'?

I have tried forcing the string into c and posix collations, but these are ANSI only.

A Smith
  • 33
  • 6

1 Answers1

0

I have discovered the problem and a solution to it. My DB, as mentioned in the question, used 'English_United States.utf8' collation as default to match the encoding of UTF8. However on a windows environment, the 'English_United Kingdom.1252' collation works with UTF8 encoding, despite being specified for ANSI, and returns the uppercase characters as expected.

To resolve the issue I had to create the collation in the db using;

CREATE COLLATION "English_United Kingdom.1252" (LC_COLLATE='English_United Kingdom.1252', LC_CTYPE='English_United Kingdom.1252');

Which places it in the public schema. You can then manually correct the issue in queries by calling collate "English_United Kingdom.1252" against any strings with accents, or use

alter table [table_name] alter column [column_name] type text collate "English_United Kingdom.1252"

Against any columns that have accented characters to fix the collation permanently. Unfortunately there is no way to change the default collation for a DB once it is created without doing a full backup, drop, and restore.

A Smith
  • 33
  • 6