Semantically secure algorithms can create many different ciphertext values for any single plaintext, mostly through the use of a salt or initialization vector. However, that is an obvious issue for detecting duplicate values in a column, as there won't be any!
For example, when I use AES-256-CBC to store a base64-encoded value in <iv>$<ciphertext>
format into a table, plaintext "SMITH, JOHN" may be stored as
fnhY58/hCd78VjY51YL8QA==$B9W1bEUWkK4gqyWlzJULMA==
9d9p9/L3Kla9oA/pOUFo1g==$5notB0QSZaGJ+/M5dpTJ/Q==
Y3NETmg53010roB6iMBMVA==$55Ap+vKHYqLlUDNzAhAwZg==
...
If that column is marked as a UNIQUE index or a PRIMARY KEY, duplicate detection and suppression (e.g. INSERT ON DUPLICATE KEY UPDATE) will silently fail, as the ciphertexts will obviously be different.
(I guess this may be the reason why MariaDB's built-in AES functions use ECB mode, and MySQL's default to it unless you tell them otherwise...)
Is there any practical way of using semantically secure encryption while preserving duplicate detection?
I guess one workaround would be to create an additional column with a securely derived hash of the plaintext using a KDF (using identical salt and iterations parameters), and using that for duplicate detection. But does that not defeat the point of semantic security by allowing the attacker to recognize related ciphertexts?
I imagine this use case is not uncommon, so I'm guessing there is an established workaround that I'm just not aware of. Thanks in advance for the advice!