1

This is a request for clarification concerning the Swisscom Cloud MariaDB Service. This is not about the generic MariaDB default charactersets.

  1. I would like to know what the default character_set_server for the MariaDB Service on the Swisscom Cloud is defined as.
  2. Assuming that it is actually latin1, I would like to know if this default can be changed (to utf-8 or more accurately utf8mb4 as it is for the debian mariadb package).

The main reason for my question is that I am having trouble creating backups that can easily be re-imported and I am getting widely different size output when extracting via mysqldump ... -r backup.sql compared to an export through Sequel Pro.

Chris
  • 3,245
  • 4
  • 29
  • 53

2 Answers2

2

I don't use the tool Sequel Pro.

The legacy MariaDB/Galera cluster uses latin1 as default character set when you don't set it. This default setting is from upstream MariaDB developers. Most developers and DBAs don't like this decision.

Here is the CREATE TABLE syntax (see COLLATE and CHARACTER SET) :

CREATE TABLE tbl_name (column_list)
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]]

Server settings (this statement can every customer execute, no admin rights needed):

MariaDB [(none)]> show global variables like 'character_set%';
+--------------------------+-------------------------------------+
| Variable_name            | Value                               |
+--------------------------+-------------------------------------+
| character_set_client     | latin1                              |
| character_set_connection | latin1                              |
| character_set_database   | latin1                              |
| character_set_filesystem | binary                              |
| character_set_results    | latin1                              |
| character_set_server     | latin1                              |
| character_set_system     | utf8                                |
| character_sets_dir       | /usr/share/mariadb-galera/charsets/ |
+--------------------------+-------------------------------------+
8 rows in set (0.00 sec)

If you still use legacy MariaDB, please hurry up and migrate your service instance. The legacy cluster will be in one or two days read-only.

We have a completely new MariaDB service in store for you. It comes with many improvements (e.g. MariaDB version 10) and we recommend to upgrade as soon as possible.

This step-by-step guide was created Aug 16, 2017.

The new cluster uses better default settings:

MariaDB [(none)]> show global variables like 'character_set%';
+--------------------------+------------------------------------------------------------------------------------------+
| Variable_name            | Value                                                                                    |
+--------------------------+------------------------------------------------------------------------------------------+
| character_set_client     | utf8                                                                                     |
| character_set_connection | utf8                                                                                     |
| character_set_database   | utf8                                                                                     |
| character_set_filesystem | binary                                                                                   |
| character_set_results    | utf8                                                                                     |
| character_set_server     | utf8                                                                                     |
| character_set_system     | utf8                                                                                     |
| character_sets_dir       | /var/vcap/data/packages/mariadb/95a1896c4bcdccb4e1abd81b7d00eb33aedb0da0/share/charsets/ |
+--------------------------+------------------------------------------------------------------------------------------+
8 rows in set (0.01 sec)

If you wish to change your character set and collation before importing the data into new cluster.

ALTER TABLE tbl_name
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]

We did this many times. Please keep in mind that you shouldn't do this with > 20000 records.

Please also have a look at our documentation for MariaDB

Please contact Swisscom Application Cloud support when you need to issue long running DDL changes (for example ALTER TABLE with thousends of records). Long running DDL causes an outage for all customers on the cluster.

Sybil
  • 2,503
  • 3
  • 25
  • 36
  • 1
    Great to see that the new MariaDB is UTF-8 by default - we already did the switch but had issues when restoring backups which seem to be related to a mix of charsets – Chris Feb 08 '18 at 08:01
  • 1
    Well... MySQL's `utf8` is a subset of `UTF-8`. It would be better to go all the way to `utf8mb4`. – Rick James Feb 13 '18 at 02:39
  • 1
    That `ALTER` does not do anything other than change the _default_ for newly added columns !. See `ALTER ... CONVERT TO ...` instead. – Rick James Feb 13 '18 at 02:40
1

This is to demonstrate two different ALTERs; one does nothing useful; one does what you want:

-- Try the 'wrong' ALTER:

    DROP TABLE IF EXISTS atc;
    CREATE TABLE atc ( c VARCHAR(11))  CHARACTER SET latin1 ;
    INSERT INTO atc (c) VALUES ('ÃÔäô');
    SHOW CREATE TABLE atc\G
    SELECT c, HEX(c) FROM atc;   -- note the latin1 encoding
    ALTER TABLE atc CHARACTER SET utf8;
    SHOW CREATE TABLE atc\G   -- the column stays latin1, overriding the table
    SELECT c, HEX(c) FROM atc;   -- still latin1 encoding

mysql> 
mysql> -- Try the 'wrong' ALTER:
mysql> 
mysql> DROP TABLE IF EXISTS atc;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE atc ( c VARCHAR(11))  CHARACTER SET latin1 ;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO atc (c) VALUES ('ÃÔäô');
Query OK, 1 row affected (0.00 sec)

mysql> SHOW CREATE TABLE atc\G
*************************** 1. row ***************************
       Table: atc
Create Table: CREATE TABLE `atc` (
  `c` varchar(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SELECT c, HEX(c) FROM atc;   -- note the latin1 encoding
+----------+----------+
| c        | HEX(c)   |
+----------+----------+
| ÃÔäô     | C3D4E4F4 |
+----------+----------+
1 row in set (0.00 sec)

mysql> ALTER TABLE atc CHARACTER SET utf8;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE atc\G   -- the column stays latin1, overriding the table
*************************** 1. row ***************************
       Table: atc
Create Table: CREATE TABLE `atc` (
  `c` varchar(11) CHARACTER SET latin1 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> SELECT c, HEX(c) FROM atc;   -- still latin1 encoding
+----------+----------+
| c        | HEX(c)   |
+----------+----------+
| ÃÔäô     | C3D4E4F4 |
+----------+----------+
1 row in set (0.00 sec)

-- Try the 'right' way:

    DROP TABLE IF EXISTS atc;
    CREATE TABLE atc ( c VARCHAR(11))  CHARACTER SET latin1 ;
    INSERT INTO atc (c) VALUES ('ÃÔäô');
    SHOW CREATE TABLE atc\G
    SELECT c, HEX(c) FROM atc;   -- note the latin1 encoding
    ALTER TABLE atc CONVERT TO CHARACTER SET utf8;
    SHOW CREATE TABLE atc\G   -- column and table are utf8
    SELECT c, HEX(c) FROM atc;   -- now utf8

mysql> 
mysql> -- Try the 'right' way:
mysql> 
mysql> DROP TABLE IF EXISTS atc;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE atc ( c VARCHAR(11))  CHARACTER SET latin1 ;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO atc (c) VALUES ('ÃÔäô');
Query OK, 1 row affected (0.00 sec)

mysql> SHOW CREATE TABLE atc\G
*************************** 1. row ***************************
       Table: atc
Create Table: CREATE TABLE `atc` (
  `c` varchar(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SELECT c, HEX(c) FROM atc;   -- note the latin1 encoding
+----------+----------+
| c        | HEX(c)   |
+----------+----------+
| ÃÔäô     | C3D4E4F4 |
+----------+----------+
1 row in set (0.00 sec)

mysql> ALTER TABLE atc CONVERT TO CHARACTER SET utf8;
Query OK, 1 row affected (0.03 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE atc\G   -- column and table are utf8
*************************** 1. row ***************************
       Table: atc
Create Table: CREATE TABLE `atc` (
  `c` varchar(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> SELECT c, HEX(c) FROM atc;   -- now utf8
+----------+------------------+
| c        | HEX(c)           |
+----------+------------------+
| ÃÔäô     | C383C394C3A4C3B4 |
+----------+------------------+
1 row in set (0.00 sec)

mysql> 
Rick James
  • 135,179
  • 13
  • 127
  • 222