Why am I seeing "COLLATION 'xxx' is not valid for CHARACTER SET 'yyy'"

Question

I am on MySQL 5.6.22 (InnoDB) on Amazon RDS. I have attempted to set all of my tables, columns, connection and database charset and collation settings to utf8mb4 / utf8mb4_unicode_ci. I can find no evidence anywhere that anything has charset latin1, yet when I execute the following code (either via node-mysql, or directly in "Sequel Pro" app on my Mac):

update MyTable m 
set m.Column8 = 1
where m.Column3 = 26 and m.Column4 = 76
collate utf8mb4_unicode_ci

I get this error message:

COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'

I cannot find anything set to latin1 in my configuration.

Output of show variables like "char%":

character_set_client        utf8mb4
character_set_connection    utf8mb4
character_set_database      utf8mb4
character_set_filesystem    utf8mb4
character_set_results       utf8mb4
character_set_server        utf8mb4
character_set_system        utf8
character_sets_dir          /rdsdbbin/mysql-5.6.22.R1/share/charsets/

Output of show variables like "collation%":

collation_connection        utf8mb4_unicode_ci
collation_database          utf8mb4_unicode_ci
collation_server            utf8mb4_unicode_ci

MyTable's CREATE TABLE info is:

CREATE TABLE `MyTable` (
  `Column1` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `Column2` varchar(12) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Column3` bigint(20) unsigned NOT NULL,
  `Column4` bigint(20) unsigned NOT NULL,
  `Column5` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
  `Column6` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
  `Column7` varchar(112) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `Column8` tinyint(1) unsigned NOT NULL,
  `Column9` decimal(16,14) DEFAULT NULL,
  `Column10` decimal(17,14) DEFAULT NULL,
  `Column11` bigint(20) unsigned DEFAULT NULL,
  `Column12` bigint(20) unsigned DEFAULT NULL,
  `Column13` timestamp(6) NULL DEFAULT NULL,
  `Column14` timestamp(6) NULL DEFAULT NULL,
  `Column15` tinyint(4) NOT NULL DEFAULT '1',
  `Column16` tinyint(4) NOT NULL DEFAULT '1',
  `Column17` varchar(15) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Column18` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Column19` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`Column1`),
  KEY `IX_Reevues_Column3` (`Column3`),
  KEY `IX_Reevues_Column4` (`Column4`),
  KEY `IX_Reevues_Column6` (`Column6`),
  KEY `IX_Reevues_Column8` (`Column8`),
  KEY `IX_Reevues_Column2` (`Column2`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

[May be of help](https://github.com/santisaez/powerstack/issues/63). — potashin, Mar 08 '15 at 18:07
You are using character fields in WHERE clause. What is the purpose of 'collate' , out of curiousity? — Vladislav Vaintroub, Mar 08 '15 at 20:51
This SQL statement was the simplest version of my real statement that I could show that still triggered the error. The real statement normally does contain a comparison of two string values, so that's why the COLLATE statement was originally added. Are you suggesting that if no string values are being compared, the COLLATE statement should be removed, or it will generate this type of error? — Mason G. Zhwiti, Mar 08 '15 at 21:44

Rick James · Accepted Answer · 2015-03-10T05:22:34.053

8

Why have a COLLATE clause when comparing a BIGINTs compared to literal numbers? Remove the COLLATE clause in the UPDATE statement. -- This is the main solution, as per OP's comments.
Is the code inside a Stored Routine that was build with latin1? Do SHOW CREATE PROCEDURE (or FUNCTION) to see if that were the case. If so, then DROP and reCREATE it with utf8mb4 in force.
It is risky to change character_set_filesystem and character_set_server. Change them back.

edited Mar 10 '15 at 05:22

answered Mar 08 '15 at 21:18

Rick James

135,179
13
127
222

This SQL statement was the simplest version of my real statement that I could show that still triggered the error. The real statement normally does contain a comparison of two string values, so that's why the COLLATE statement was originally added. Are you suggesting that if no string values are being compared, the COLLATE statement should be removed, or it will generate this type of error? As for #2, I don't see any stored routines with latin1, but I did find some with utf8 instead of utf8mb4, so thank you for that tip. Can you elaborate on point #3? – Mason G. Zhwiti Mar 08 '15 at 21:51
1

You simplified it too much. Numbers don't need collation. Were the string values in columns or literals? – Rick James Mar 08 '15 at 22:16
#3 -- I don't know what impact it has. I have essentially never heard of needing to change them, nor seen anyone do them. – Rick James Mar 08 '15 at 22:17
You may be onto something here. In going through the history on this call, it looks like it was a copy/paste from a different call that was comparing a string value literal with a string column. COLLATE was there at the time for that, but then later the string comparison was removed, and the COLLATE remained. Do you think if no string columns were involved in the comparison, this is the type of error you'd end up seeing when using COLLATE? – Mason G. Zhwiti Mar 08 '15 at 22:31
It's still a mystery as to where "latin1" came from. Oh... Where is the UPDATE coming from? mysql commandline tool? phpmyadmin? PHP code? etc? – Rick James Mar 08 '15 at 22:34
Oops, I see now: " node-mysql, or directly in "Sequel Pro" app " -- check the settings in them. – Rick James Mar 08 '15 at 22:35
Yeah that's what I can't figure out, I've checked all these settings, overriding defaults when necessary, and can't find any references to latin1 anywhere. However, removing COLLATE from this call at least causes the warning to go away, and I'm not expecting any ill effects at this point since I see now this SQL call no longer compares strings. – Mason G. Zhwiti Mar 08 '15 at 22:37
Hmmm... my character_set_server = 'latin1', but yours is not. You might change that back -- to see what happens. – Rick James Mar 08 '15 at 22:42
1

@MasonG.Zhwiti: As to where `latin1` came from, I have filed a [bug report](http://bugs.mysql.com/bug.php?id=76215). – eggyal Mar 09 '15 at 01:58
1

@eggyal Thank you, so this confirms it: my issue is that I accidentally used COLLATE in a situation where there are no string comparisons, and the numbers being compared were being convert to strings on the fly, in the 'latin1' set, likely a bug in MySQL which you filed. If you want to post that as a separate answer, I'd like to split the bounty between you and Rick James. Thanks guys! – Mason G. Zhwiti Mar 09 '15 at 03:04
Aha! mysql> `SELECT 1 > 2 COLLATE utf8mb4_unicode_ci;` yields `ERROR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'`. – Rick James Mar 09 '15 at 04:03
Are you saying that the over-simplification of the question led to a red herring?? – Rick James Mar 09 '15 at 04:03
@MasonG.Zhwiti: Yes, that's correct. You can't split bounties—award away to Rick. :) – eggyal Mar 09 '15 at 08:46
@RickJames Oh, no I was really seeing this in the field, too. ;-) But I was wrong in thinking I was still doing a string comparison in my real query. So my example code was right to highlight an incorrect use of COLLATE. ;) Going to award bounty now. Can you highlight in your answer that this specific issue was in fact the answer, and maybe link to eggyal's bug report? Thanks guys! – Mason G. Zhwiti Mar 09 '15 at 13:20
Need to wait 4 hours to award. – Mason G. Zhwiti Mar 09 '15 at 13:22
So this is five years old, but apparently still a bug. I’m currently being hit by this bug in a similar case where collation + numbers is sort of inevitable. I’m trying to find exact duplicates in a table where there are lots of heavily accented characters. The table is utf8mb4, so I need to collate by utf8mb4_bin to avoid, say, _*dʰéh₁(i)-lio-_ and _*dʰéh₁(i̯)-lio-_ being considered the same (they should be considered different). So I do `SELECT word FROM table GROUP BY word HAVING COUNT(word) > 1` – and get hit with this, because the COUNT() value is an int being compared with collation. :-/ – Janus Bahs Jacquet Oct 06 '20 at 18:04
@JanusBahsJacquet - Be aware that `i̯` is actually 2 characters, the letter `i` and a non-spacing COMBINING INVERTED BREVE BELOW. `'i' = 'i̯' COLLATE utf8mb4_unicode_520_ci` is TRUE for most collations, but not _general_ci and _bin. – Rick James Oct 06 '20 at 21:52
@RickJames That was just a random example – there are many more where both forms are single characters. I asked a question about it and someone suggested `GROUP BY HEX(word)` as a workaround – that way, no explicit collation is needed. – Janus Bahs Jacquet Oct 06 '20 at 21:55
@JanusBahsJacquet - Are you using 5.6 like the OP? If not, perhaps it is worth starting a new Question. Note also that `COUNT(word)` counts the word as long as `word IS NOT NULL`. Perhaps you want `COUNT(DISTINCT word)`, which counts the number of distinct values of `word`. (Then we get into dissecting how `DISTINCT` works with `COLLATIONs`.) – Rick James Oct 06 '20 at 22:00
@RickJames I already did ask a new question, that’s where I got that comment. :-) Yes, unfortunately on 5.6 and unable to upgrade; seems the bug was fixed in 5.7, according to the MySQL bug log. `word` is never `NULL`, so within the grouper result set, it’s fine that all non-null instances are counted. – Janus Bahs Jacquet Oct 06 '20 at 22:02
@JanusBahsJacquet - Could you provide a link to that other Question. – Rick James Oct 06 '20 at 22:10
Here it is: https://stackoverflow.com/questions/64231720 (copy-pasting is a pain on the iOS app – doesn’t work half the time, freezes the app the other half…). – Janus Bahs Jacquet Oct 06 '20 at 22:14

Why am I seeing "COLLATION 'xxx' is not valid for CHARACTER SET 'yyy'"

1 Answers1

Linked