1

I want to match filter results exactly using CHARACTER SET utf8mb4.

ALTER DATABASE test CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

CREATE TABLE users (    
username VARCHAR(25) NOT NULL,    
password VARCHAR(25) NULL NULL
) 
ENGINE=innodb DEFAULT CHARSET=utf8mb4 
COLLATE=utf8mb4_general_ci ROW_FORMAT=DYNAMIC;
CREATE  INDEX  users_username_idx ON users (username); 

I inserted 1 million rows also containing below rows.

INSERT INTO users SELECT 'üsasdasd','somepassword';
INSERT INTO users SELECT 'usasdasd','somepassword';
INSERT INTO users SELECT 'pğasdasdasd','somepassword';
INSERT INTO users SELECT 'anfüs','somepassword';
INSERT INTO users SELECT 'anfus','somepassword';

Then applied some queries.

SELECT * FROM users WHERE username LIKE 'üs%';
SELECT * FROM users WHERE username LIKE 'us%';
SELECT * FROM users WHERE username LIKE 'pğ%';
SELECT * FROM users WHERE username = 'anfüs';
SELECT * FROM users WHERE username = 'anfus';

Why do i get results that ALSO have ü letter(like üsasdasd or anfüs) when i filter as "LIKE 'us%'" or " = 'anfus'"
OR Why do i get results that ALSO have u letter(like usasdasd or anfus) when i filter as "LIKE 'üs%'" or " = 'anfüs'" ?

How can i get results that ONLY have ü letter(like üsasdasd or anfüs) when i filter as "LIKE 'üs%'" or " = 'anfüs'" vice versa ?

ANSWER:

I dont know why but, COLLATE=utf8mb4_0900_as_cs fixed the issue.

lowdegeneration
  • 359
  • 5
  • 13

1 Answers1

1

For passwords, you want to ignore all case folding, accent stripping, etc, so use COLLATE utf8mb4_bin.

I suspect utf8mb4_0900_as_cs is not quite the same as _bin.

On the other hand, do you really want to store "plaintext" passwords? A common technique is to store a hash of the pwd and do the hash when testing. A simple hash that is secure from casual hackers:

MD5(CONCAT(password, 'my secret salt'))
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • yes utf8mb4_bin also worked for me. But what is the actual difference? is it case folding? – lowdegeneration May 02 '19 at 17:02
  • All `_bin` collations compare bit-for-bit. No case folding, no nothing. I have not researched the new-with-8.0 `..._as_cs` collations; the name implies "Accent Sensitive and Case Sensitive", but there are other things that Collations check. – Rick James May 02 '19 at 22:21