0

I have a MySQL database that is all in utf8_general_ci charset.

The only exception is the field subject in the newsletter table that has charset utf8mb4_general_ci.

This is done to allow storing emoji utf8 chars.

When via a php script I establish a connection to the db I run

mysqli_set_charset($link, 'utf8mb4')

this way I am also able to display the utf8 emoji chars.

So, my website exposes urls like this:

https://example.org/my-slug

that internally are translated in a query like this:

SELECT * FROM page WHERE slug = 'my-slug'

The problem is that sometimes (expecially from Bing, Microsoft IPs) I receive visits to urls like

https://example.org/my-slug%0cL%ce%0c%cd%cc%8c%8d%0cL

that originates this query

SELECT * FROM page WHERE slug = 'my-slug^LL<CE>^L<CD>̌<8D>^LL'

that fails with the error

PHP User Warning - Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '='

Is there a way solve this error:

  • without modifying the query
  • without changing the all database charset from utf8_general_ci to utf8mb4_general_ci
  • without losing the ability to save and display utf8 emoji

?

nulll
  • 1,465
  • 1
  • 17
  • 28

1 Answers1

0

%0cL%ce%0c%cd%cc%8c%8d%0cL is gibberish that does not map to meaningful information in any character set that I have tested.

Shift-JIS, for example, interprets it as Lテヘフ訣L; does that make sense in Japanese? Latin1: LÃÍÌŒL. It breaks utf8 and utf8mb4. Koi8r and koi8uLцмл▄█L.

The hex for that string has an unusually large number of C: 0c4cc30ccdcc8c8d0c4c

If you must deal with such strings, then I suggest you use VARBINARY or BLOB (depending on size) for the column datatype. This has the drawback of not doing case folding, but maybe that is not relevant.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • With the same gibberish url, the error does not occur if I set the connection charset to `mysqli_set_charset($link, 'utf8')` – nulll Feb 06 '19 at 10:50