3

I am developing a website by using ASP.net and my DB is MYSQL. In there users can submit articles. This site goes internationally so I dont want to restrict the language only to English.

So I decided few things. Please guide me If I made the wrong choice.

1) I choose utf8mb4 as database charset. Because it is an improved version of UTF8 for store further characters. Am I made the right choice? I mean I have only few tables where need to use utf8mb4. So Shall I use Latin1 as Database charset?

2) I dont have an idea which collation to use for above charset. I decided to use utf8mb4 swedish_ci. Or should I use general Ci or any other?

3) In my tables most of tables not needed utf8mb4 charset. Latin 1 swedesh will do the work. So can I maintain selected tables under specific charset and collation even DB is in another Charset and collation?

4) Can I use utf8mb4 charset for a specific column in a table which have Latin1 swedesh as charset?

If those can do what is the relationship between database charset, table charset and column charsets?

Is different charsets lead to any performance issues?

Thank you very much.

Prageeth Liyanage
  • 1,612
  • 2
  • 19
  • 41

1 Answers1

4

The database charset is inherited by the table, unless you override it. (I recommend being specific at the table level.)

The table charset is inherited by the columns in the table. Since one usually has only one charset, this inheritance is fine. Also, it is pretty clear when you do SHOW CREATE TABLE what each column is set to -- without having to look at the database or system.

Go international -- use utf8 or utf8mb4. I agree that utf8mb4 is a better choice, especially for Chinese and some emoticons.

character_set_% -- Only _client, _connection, and _results are important. And these are the three that are set by SET NAMES utf8mb4. Leave the rest alone.

The default collation for utf8mb4 is utf8mb4_general_ci, which is possibly a good choice if you have multiple languages. The other choice is utf8mb4_unicode_ci . I talk more about "combining diacriticals" in http://mysql.rjweb.org/doc.php/charcoll#combining_diacriticals . This section gives examples of where those two collations differ: http://mysql.rjweb.org/doc.php/charcoll#utf8_collations_examples

See also the "Best Practice" section.

latin1 is smaller than utf8 for Western European text. MySQL will do the proper conversions when needed, so that is not a problem. But I prefer not to confuse the programmer by mixing character sets. Keep in mind that converting an existing table column from latin1 to utf8 takes some effort, possible downtime, and maybe risk.

4) Can I use utf8mb4 charset for a specific column in a table which have Latin1 swedesh as charset?

Yes. Each column (but not each row) can have a different character set and/or collation.

The existence of different charsets is not a performance, per se. What could bite you is WHERE col1 = col2 (and other cases) when the two columns have a different character set and/or collation. MySQL will abandon an otherwise perfectly good index if it sees a difference that is not easy to handle.

Rick James
  • 135,179
  • 13
  • 127
  • 222