0

According to the MYSQL8 docs:

Several questions about character set and collation handling for client connections can be answered in terms of system variables:

  • What character set are statements in when they leave the client?
    The server takes the character_set_client system variable to be the character set in which statements are sent by the client.
  • What character set should the server translate statements to after receiving them?

    To determine this, the server uses the character_set_connection and collation_connection system variables:

    The server converts statements sent by the client from character_set_client to character_set_connection. Exception: For string literals that have an introducer such as _utf8mb4 or _latin2, the introducer determines the character set. See Section 10.3.8, “Character Set Introducers”.

After reading the quote, I am confused. Does the document want to tell us that if an introducer is used, the introducer will replace the character_set_connection encoding?

Or let me give a specific example, is there any difference between executing select _gbk '中文'; and executing select '中文';? How does the introducer _gbk affect the server's character set conversion for statements?

I hope someone can help me explain the meaning of the official descriptions and this example of mine, thank you very much.

The reference link : charset-connection

Jinke2017
  • 49
  • 6
  • The introducer is relevant (only) for the (one) next string. – Solarflare Jul 11 '20 at 17:06
  • I know, so will the introducer replace the character_set_connection when server process **this statement** ? – Jinke2017 Jul 12 '20 at 01:46
  • I am not sure where you get the "statement" from. Your quote states "string literal" (so `'data'`). If there would be another string in that statement, the introducer would not apply to it. It is like casting: you cast exactly one value. I am not entirely sure what you are getting at. – Solarflare Jul 12 '20 at 10:54
  • I wonder know how will the mysql server transform the whole sql when there is a introducer in the sql. My understanding: The server will disassemble the sql and divide it into two parts, one part is a string with an introducer, and the other part is the rest of the sql statement, and then perform character set conversion separately. Is there something wrong with my idea? – Jinke2017 Jul 12 '20 at 14:18
  • MySQL doesn't have to do any splitting of the statement based on characterset. A string is just a bunch of bytes. It's like using hex numbers, e.g. (pseudo code) `select _hex '10'` is something different than `select _dec '10'`, but there is nothing special about what MySQL would need to do with those (until it needs to compare/store it in a potentially different format, e.g. something like `where _hex '10' = _dec '10'`) - unless I (still) misunderstand your question and you are asking something different. – Solarflare Jul 12 '20 at 20:18
  • I am agreed that a string is just a bunch of bytes. But different characters may have different encodings in different character sets, which may cause garbled characters when conversion. We konw that Mysql server will use some character set parameters, such as character_set_client, character_set_connection to convert the statement after receiving, I want to know how the introducers affect the conversion process. – Jinke2017 Jul 13 '20 at 13:59
  • Mysql's documentation explains this, which is the reference in my problem description(**Exception: For string literals that have an introducer such as _utf8mb4 or _latin2, the introducer determines the character set.**), but I don't understand its meaning. Or let me give a specific example, is there any difference between executing `select _gbk '中文';` and executing `select '中文' ` ? How does the introducer `_gbk` affect the server's character set conversion for statements? My English is not good, I will try my best to express my question clearly, thank you. – Jinke2017 Jul 13 '20 at 14:00
  • I still don't understand your problem. (I understand that you have one, but I cannot grasp it.) I have to assume you understand how character sets work on principle? Can you confirm that? The introducer is just an exception to the default value (the connection setting) for the (one!) next string value. I'll try to answer your specific example, but still don't think it's your actual problem: if you send data bytes that look like `select '中文'` at your end to a MySQL server with the connection setting `latin1` (and thus reads those bits and bytes as latin1) will see `select '中文'`. – Solarflare Jul 13 '20 at 15:29
  • Yes, I want to understand how character sets work **when using an introducer**. You have explained the `select '中文'; `situation. Do you mean that the server will convert the sql string from `character_set_client` to `character_set_connection`, right? – Jinke2017 Jul 14 '20 at 06:30
  • If my idea is consistent with your expression, can you talk about the character set conversion process of this `select _gbk '中文'; ` situtation? Does the server simply treat the literal string ( '中文') as gbk encoding, as if the two properties of `character_set_client` and `character_set_connection` are temporarily set to gbk(I am only talking about the effect, not the true way of mysql)? – Jinke2017 Jul 14 '20 at 06:30

1 Answers1

0

The difference between select _gbk '中文' and select '中文' is that in select _gbk '中文', '中文' is understood, when it is being transmitted from the client to the server, to be in the GBK character set, whereas in select '中文', '中文' is understood to be in the character set found in character_set_client.

chaos
  • 122,029
  • 33
  • 303
  • 309
  • Is there relationship between the introducer `_gbk` and `character_set_client` and `character_set_connection`? Thanks for your reply. – Jinke2017 Jul 14 '20 at 06:42
  • Can it be said that if an introducer is used, the MySQL server will ignore the two system variables `character_set_client` and `character_set_connection` and use the character set `gbk` specified by the introducer `_gbk`? – Jinke2017 Jul 14 '20 at 06:55