How can we save emojis in SQL database to preserve their uniqueness for equality comparisons?

Question

We have an issue when we save emojis to our database.

We have altered our table column to a character set and collation that should save enough bytes to distinguish between emojis.

This was our query to modify the table column:

ALTER TABLE TableName MODIFY TableColumn VARCHAR(2048) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

When we save the following emojis, the binary values are as follows:

 ♥️  | e2 99 a5 ef b8 8f
 | f0 9f 90 92
 | f0 9f 90 b5

Some emojis (♥️) save with binary information that allows us to distinguish them as unique/separate emojis.

Since we do not want to save the same emoji twice, for our use case, we are checking for the 'value' in our database. If the value exists, we do not save the emoji.

However, many emoji values, including and , are treated as the same value.

Therefore, when we attempt to save and , only one gets saved...

Why are they treated as the same value?

How can we save and as identifiably different emojis, using SQL?

Is there a better char set or collation we could use?

We are accessing a MySql and MSSql databases via separate repositories, written in C# in a .NET framework project.

Which database system are you using? MySQL? Oracle? SQL Server? Please just add a tag of the respective system by editing the question. From which environment are you accessing the database? PHP? .net? Java? Please add that information to your question. Also, could you add the connection string that you use to connect to the database, and potentially any property settings of the connection which you change after connecting - if this is the case. — FrankPl, Apr 26 '21 at 13:43
I have updated the question. We are using MySql and MSSQL and accessing them via repositories written in C# in a .NET framework project, using ASP.NET. The connection strings vary depending on the state of the project, and the database in use at the time. We have no issues regarding our db connections. — Tom Francis, Apr 27 '21 at 11:15
I am not sure what causes the problem then, Maybe a bug in string handling in the SQL interface for some Unicode values above `ff ff`? Interesting that bit happens with the same effects with two different databases, which both have their own routines for this type of string handling.. — FrankPl, Apr 28 '21 at 16:23

score 0 · Answer 1 · answered Apr 22 '21 at 11:52

0

Actually, your code for the sitting monkey is wrong! That would be f0 9f 90 92, not f0 9f 90 b5. See https://www.fileformat.info/info/unicode/char/1f412/index.htm, the UTF-8 hex code. The other two codes seem correct.

Whatever you did, the error is not on the side of the code you showed.

answered Apr 22 '21 at 11:52

FrankPl

919
3
12

You are absolutely right, the last byte is indeed different. I have updated the question, since both values `f0 9f 90 92` and `f0 9f 90 b5` are still not being treated as **different** values in SQL. – Tom Francis Apr 26 '21 at 13:03

How can we save emojis in SQL database to preserve their uniqueness for equality comparisons?

1 Answers1