1

I would like to protect my users' username in an online service, as it may be personally identifying (e.g., an email-address), but am wondering if it's even possible...

My first inclination was to hash it (unsalted), but am worried about possible hash collisions. Not so much worried about the probability of a collision in an SHA256 32-bit hash, but more about the possibility that the class of usernames used could be just prone to collisions.

I also looked into perfect hashes, but as the users can be added dynamically, that's going to be too hard to manage.

Another option I've thought of was that (when adding the user) if there were a hash collision, I would reply to the client with a request for another hash, and repeat until there was no collision. I'd repeat this process during log-in. However, I am also wondering if this actually makes it easier for an attacker, as they'd have more feedback about what hashes were successful, and if the database were compromised, all the additional hashes would make recovering the original value easier.

I was also considering encrypting the username using the username as a password, but I'm guessing this also suffers from collisions (because each entry has a unique password--two different plain-texts with two different passwords could result in the same cipher-text), so I'm thinking it's not worth exploring this further.

I don't really want to go with a custom username (where the user has to come up with something that hasn't been taken when they sign-up), as I'm expecting the user to very rarely use the service, and are likely to forget their username.

I'm currently thinking I will just go with the first idea of hashing once, and if there is a collision, have the password decide (and hope there's no collision there too--I could put a warning when the user signs-up saying that their username/password is not allowed because it will log them in as another user perhaps /S).

Is there any non-colliding way of creating a secure form of username?

Thank you.

Eliott
  • 332
  • 3
  • 13
  • 2
    I dont like the sound of you not having a unique user name. – KeithC May 04 '17 at 02:04
  • @KeithC Sorry, I see that is confusing. The username has to be unique (in the domain of usernames)--if it wasn't, it would be guaranteed to cause a collision. What I meant there is, I don't want it to be unique to the service. I'm not sure the correct way to state it, but maybe an example: I mean unique to the domain of usernames is like an email-address, unique to the service is like "mrjoe123". I don't want "mrjoe123" because the user will forget it. I've edited the question. Thank you. – Eliott May 04 '17 at 02:38
  • 1
    _"-- but more about the possibility that the class of usernames used could be just prone to collisions"_ - such flaw would most likely have been discovered already, I wouldn't worry about it. If you do encounter a collision, just inform the user that the username has already been taken. – 1615903 May 04 '17 at 06:05
  • @1615903 Thank you. I not sure I'd be able to say their username has already been taken, as I want it to be something personal to them (such as an email-address). Does Murphy's Law apply here? – Eliott May 04 '17 at 08:18

1 Answers1

0

Assuming we are talking about emails, as there aren't many other options usable for login names.

I was also considering encrypting the username using the username as a password, but I'm guessing this also suffers from collisions (because each entry has a unique password--two different plain-texts with two different passwords could result in the same cipher-text), so I'm thinking it's not worth exploring this further.

Collisions here are the wrong thing to worry about here ...

Mandatory disclaimer: Encryption keys are not the same things as passwords. And encrypting the plainText with itself as the key is even worse.

The problem with encryption is that cipherTexts aren't searchable; i.e. you cannot verify for uniqueness, unless you decrypt all user records each time, so this just isn't sustainable - your server loads will grow exponentially with each new user record.
That's because while encryption makes use of IVs (Initialization Vector; i.e. the equivalent of salts in password hashing), which results in different cipherText even if you encrypt the same plainText twice, using the same key.

However, it is very likely that you will need to encrypt those emails, as if you need to send out password reset links, notifications, etc. - you'll need a two-way mechanism. You can't do these things with hashes, because they are one-way only.
There's a reason why every website couples its user accounts with email addresses, even if they are not the login names. :)

What you can do for login checks only, is to store a HMAC (Hash-based Message Authentication Code) of the email.

HMACs look just like regular hashes, but are actually "keyed hashes" (i.e. you would use a key while hashing, similarly to encryption). And in addition to that, nobody has managed to find collisions with the HMAC construct so far, even with the now famously insecure MD5 (still, please use a modern algorithm; at least SHA-2).

I should note that HMACs aren't nearly as strong as password hashing algorithms, so your users emails certainly won't be as strongly protected as their passwords, but it's not like there's anything else you can do about it, and it should be OK.


In summary, you'll need to have two separate cryptographic keys configured in your application - one for encryption, and one for the HMACs - and the following data stored:

  • userLoginLookup - HMAC of the email, using one of the two keys
  • userLoginMailer - cipherText of the email, using the second configured key
  • userPassword - a standard password hash; using bcrypt, PBKDF2 or scrypt

Note: Cryptography is always case-sensitive, so to accomodate lookups, you need to always normalize the email addresses first; i.e. make them all-lowercase or all-uppercase.

When a user attempts to login, you do a HMAC(emailInput, hmacKey) and search for a match with the userLoginLookup field in your database.

When you need to send a notification or password reset email, you decrypt the userLoginMailer.

Narf
  • 14,600
  • 3
  • 37
  • 66
  • Thank you for your answer. I've been reading up a bit on HMACs today and have a few questions: 1) if the secret key is compromised, are all the existing HMACs also compromised, or is it just that HMACs could be impersonated; 2) if the MAC is being hashed with an algorithm that can have collisions, does that increase the possibility that the MAC could also collide; and 3) is the reason no one has found collisions because they've looked and not found any, or is it because MAC aren't meant to be used as indexes (which hashes are) so testing for collisions in MACs isn't considered? – Eliott May 04 '17 at 16:41