42

For the following MySQL create database statement, what would be the equivalent in postgresql?:

CREATE DATABASE IF NOT EXISTS `scratch` 
  DEFAULT CHARACTER SET = utf8
  DEFAULT COLLATE = utf8_unicode_ci;

I currently have:

CREATE DATABASE "scratch"
  WITH OWNER "postgres"
  ENCODING 'UTF8'
  TABLESPACE "pg_default";

Is that enough or should I be more specific including LOCALE as well?

Wil Moore III
  • 6,968
  • 3
  • 36
  • 49

1 Answers1

69

Yes, you can be more specific.

For example:

CREATE DATABASE "scratch"
  WITH OWNER "postgres"
  ENCODING 'UTF8'
  LC_COLLATE = 'en_US.UTF-8'
  LC_CTYPE = 'en_US.UTF-8';

Also I recommend to read the following pages about locales and collations in PostgreSQL:

Wil Moore III
  • 6,968
  • 3
  • 36
  • 49
michael.bochkaryov
  • 2,318
  • 18
  • 14
  • 1
    Thanks Michael. This was more of a confirmation as I came up with the same; however, I never see examples like this _anywhere_ but I always prefer to be 100% explicit. I had a peek at the docs before asking this question; however, while I still came up with exactly the syntax you gave me, I still did not feel confident that I wasn't missing something. That to me tends to allude to the fact that the docs could be a bit better on providing examples. – Wil Moore III Apr 01 '12 at 20:52
  • 5
    This worked for me but I needed to add the following constraint in order to make it work: `TEMPLATE = template0` – Lucas Nov 27 '15 at 13:11
  • 7
    pg's collations are extremely confusing. On mysql, utf8mb4_unicode_ci is the standard - all languages in one collation. This isn't an RFC or ISO defined behaviour, but it works! The whole point of unicode/utf8 is to support all languages *seamlessly*. To specify utf8, and then be forced to sort according a single collation like en_US is not only confusing, but useless. The requirement is to show English, Chinese, Japanese et al on the same page in some kind of defined order! I can't just assume that every client is working with a single primary language. How the heck to support multi-language? –  Sep 13 '16 at 02:49
  • To expand, if I'm not mistaken, you are required to specify a single language as a collation tied to that utf8 charset. But the entire reason we want utf8 is to simultaneously support multiple languages stored in the same columns. How do we manage to have both English and Chinese stored in a utf8 column in postgres, while managing to collate properly across those languages? It makes absolutely no sense to have utf8, while being restricted to a single language at a time. I can't help but feel like postgres snubs its nose at us saying "that's not semantically how it works", leaving us to drown. –  Sep 13 '16 at 02:52
  • 1
    @Nate, If you need linguistically correct sorting of strings in a multi-language application, you'll have to do sorting in your application anyways. Unicode strings are, if done properly, collated using the [Unicode Collation Algorithm](http://unicode.org/reports/tr10/) which in turn uses a "Collation Element Table". It's impossible to write a CET that provides linguistically correct sorting for multiple languages at once, for the reasons stated in [1.9.2](http://unicode.org/reports/tr10/#Non-Goals). – Max Truxa Sep 13 '16 at 21:31
  • This does not create a case-insensitive database like requested in the question. – Berend de Boer May 09 '17 at 03:50