0

I'm looking for a hashing algorithm that will take an input of 16 chars string, and output a different string of 16 chars. [that can't be converted to the original string]

I've thought of taking a MD5 result and slice the first 16 chars, but i think it is not the right way to solve the problem, since it looses the hashing idea.

any suggestions? platform, if matters, is Python.

user4939476
  • 51
  • 1
  • 2
  • 1
    In what way would that lose the "hashing idea"? All good hash functions can be truncated like this and still retain all their properties (other than total entropy, obviously). – Phylogenesis May 26 '15 at 08:25
  • @Phylogenesis I have never used MD5 before, but I'm afraid that a small change in the source string (or a big change) will only affect the other 16 chars of the MD5 function output... – user4939476 May 26 '15 at 08:35
  • The way *cryptographic* hash functions work is that small changes to the input necessarily change the entire hash. If any part of the hash was predictable, then the hash would be considered insecure. As another aside, if you Base64 encode your hash, you can retain 96-bits of information in the same 16 characters (rather than the 64 bits available with hex encoding) and it would still be printable. – Phylogenesis May 26 '15 at 08:38

1 Answers1

0

Actually, you lose the hashing idea already when you decide input size needs to match the output size, as, according to Wikipedia "A hash function is any function that can be used to map digital data of arbitrary size to digital data of fixed size."

If you are building a credit card number tokenization system, just make up a random string after checking the number has not already been tokenized, check that the token does not have a collision, save the original number in the ways allowed by PCI standards (read them, https://www.pcisecuritystandards.org/documents/Tokenization_Guidelines_Info_Supplement.pdf) and you are good to go.

If not, a chopped hash function like SHA256 or MD5 will give repeatability outside your system too and risk of collisions (that's part of it being hashing), but whether those make sense to use really depends on your use case.

Lynoure
  • 121
  • 1
  • 12
  • A good cryptographic hash function (truncated or otherwise) will give exactly the same collision characteristics as a random string of the same length. – Phylogenesis May 26 '15 at 08:58
  • If you reread, you can see I'm only recommending random strings checked for uniqueness for the tokenization. When tokenizing numbers into characters, there is plenty of room for uniqueness. – Lynoure May 26 '15 at 10:00