9

Searching for the proper way to store BCrypt hashes in MySQL I found this question and it only made me more confuse.

The accepted answer point out that we should use:

CHAR(60) BINARY or BINARY(60)

But other people on the comments argue that instead we should use:

CHAR(60) CHARACTER SET latin1 COLLATE latin1_bin

or even:

COLLATE latin1_general_cs

I am not a specialist on databases so could anyone explain me the difference between all these options and which one is truly better for storing BCrypt hashes?

Community
  • 1
  • 1
Michel Feinstein
  • 13,416
  • 16
  • 91
  • 173
  • As far as I know, a cryptographic hash is a stream of bits (i.e. binary) but is often represented as plain text. What format do you need to store? – Álvaro González Mar 27 '17 at 15:30
  • Your question is dangerously close to *"Could someone please read the documentation on MySQL data types for me and make an executive summary?"* – Tomalak Mar 27 '17 at 15:46
  • @ÁlvaroGonzález yes, a hash is a stream of bits and yes its encoded as plain text, but all these storage options will lead to different types of behaviors on the database. – Michel Feinstein Mar 27 '17 at 16:08
  • @Tomalak that's really not my intention, I have read the documentation for BINARY and CHAR and the collation differences for _bin types, yet, I can't see the benefits of each other and how they compare, their caveats and surprises along the way, so my intention is actually to ask someone more experienced then I am on how these two options compare. – Michel Feinstein Mar 27 '17 at 16:10
  • @mFeinstein You should offer a bounty. I think it's an important question – Tek Aug 30 '17 at 06:47
  • Yeah, I think I will... But if you want to help, upvoting and favoring the question helps promoting it as well :) – Michel Feinstein Aug 30 '17 at 06:58

1 Answers1

9

My answer is in the line of "what is proper", rather than "what will work".

Do not use latin1. Sure, it might work, but it is ugly to claim that the encrypted string is text when it is not.

Ditto for saying CHAR....

Simply say BINARY(...) if fixed length or VARBINARY(...) if it can vary in length.

However, there is a gotcha... Whose BCrypt are you using? Does it return binary data? Or a hex string? Or maybe even Base64?

My above answer assumed it returns binary data.

If it returns 60 hex digits, then store UNHEX(60_hex_digits) into BINARY(30) so that it is packed smaller.

If it is Base64, then CHARACTER SET ascii COLLATE ascii_bin would be "proper". (latin1 with a case-sensitive collation would also work.)

If it is binary, then, again, BINARY(60) is the 'proper' way to do it.

The link you provided looks like Base64, but is it? And is it up to 60 characters? Then I would use

VARCHAR(60) CHARACTER SET ascii COLLATE ascii_bin

And explicitly state the charset/collation for the column, thereby overriding the database and/or table "defaults".

All the Base64 chars (and $) are ascii; no need for a more complex charset. Collating with a ..._bin means "compare bytes exactly"; more specifically "don't do case folding". Since Base64 depends on distinguishing between upper and lower case letters, you don't want case folding.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I don't think we can consider bcrypt as a pure base64 string, since it will always have it's prefix "$2a$" or "$2b$" or "$2y$", which could vary in any future updates to the protocol, and the cost parameter (something like $10$). Currently, the hashes are 60 characters long, the salt and hash are base64, but the prefix isn't. – Michel Feinstein Aug 31 '17 at 22:27
  • @mFeinstein - That's OK. My recommendation still works (after I fixed a typo!) – Rick James Aug 31 '17 at 22:28
  • could you just elaborate on the pros of using ascii_bin collation? I am not a specialist on DBs, so I am not aware about all the benefits and protections provided. – Michel Feinstein Aug 31 '17 at 23:13
  • I added a paragraph. – Rick James Aug 31 '17 at 23:17