37

Is there any input that SHA-1 will compute to a hex value of fourty-zeros, i.e. "0000000000000000000000000000000000000000"?

mdb
  • 52,000
  • 11
  • 64
  • 62
mckamey
  • 17,359
  • 16
  • 83
  • 116
  • What's so special about forty zeros. How is this programming related? – Mehrdad Afshari Dec 14 '09 at 17:41
  • 12
    Maybe the OP is using all zeros as a special flag or something in his program – Earlz Dec 14 '09 at 17:43
  • 5
    Was wondering *if* I could use it as a sentinel value. – mckamey Dec 15 '09 at 01:58
  • 3
    As one example, Mercurial is using special all-zeros SHA1 as [nullid](http://mercurial.selenic.com/wiki/Nodeid) – ash108 Feb 08 '12 at 08:20
  • @ash108 interesting. Seems they should have looked at this SO question! I asked because I had a similar use but knowing that something could legitimately hash to it was a deal breaker, even if highly improbable. For the hypothetical person who experiences that collision, it would be extremely bad. – mckamey Feb 09 '12 at 19:37
  • 1
    @mckamey chances are if this is bad for you to get a null hash recognized as a sentinel when it is in fact a legitimate hash value, it means you already count on the fact that sha1 has no collisions. which is not true by pigeonhole principle. so your concern goes straight to /dev/null – v.oddou Dec 10 '15 at 09:05
  • So does [git](https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols). If you were to get a commit with a zero hash, it would delete the branch on remote. – Marko Aug 19 '22 at 14:23

7 Answers7

22

Yes, it's just incredibly unlikely. I.e. one in 2^160, or 0.00000000000000000000000000000000000000000000006842277657836021%.

daf
  • 5,085
  • 4
  • 31
  • 34
  • 5
    as for any other hash I suppose, no? – SilentGhost Dec 14 '09 at 17:43
  • 8
    Yeah — in fact, a uniform probability distribution of hash values is one of the defining characteristics of a good hash function. – daf Dec 14 '09 at 18:22
  • 2
    Again, that is not guaranteed to happen. – NullUserException Sep 12 '11 at 04:26
  • 5
    If the output maps evenly between [1, 2^160], that would be uniform but still not include 0. – Aaron Digulla Sep 24 '13 at 13:38
  • @AaronDigulla I think it is virtually impossible to get a clean uniform distribution of hashes in [1, 2^n] because of the same thing that's written in linux man of rand() function. you risk to create a step in the distribution. Usual workaround is russian roulette, in this case you need to introduce a "if 0" and rehash with a new golden value as seed combiner. This introduces a risk of a flaw. unacceptable. therefore the range is [0, 2^n] – v.oddou Dec 10 '15 at 09:17
15

Also, becuase SHA1 is cryptographically strong, it would also be computationally unfeasible (at least with current computer technology -- all bets are off for emergent technologies such as quantum computing) to find out what data would result in an all-zero hash until it occurred in practice. If you really must use the "0" hash as a sentinel be sure to include an appropriate assertion (that you did not just hash input data to your "zero" hash sentinel) that survives into production. It is a failure condition your code will permanently need to check for. WARNING: Your code will permanently be broken if it does.

Depending on your situation (if your logic can cope with handling the empty string as a special case in order to forbid it from input) you could use the SHA1 hash ('da39a3ee5e6b4b0d3255bfef95601890afd80709') of the empty string. Also possible is using the hash for any string not in your input domain such as sha1('a') if your input has numeric-only as an invariant. If the input is preprocessed to add any regular decoration then a hash of something without the decoration would work as well (eg: sha1('abc') if your inputs like 'foo' are decorated with quotes to something like '"foo"').

Brian Jack
  • 468
  • 4
  • 11
  • 2
    Surely the SHA1 hash of an empty string has the same problem as the all-zeroes hash, i.e. that it's at least in principle possible that some other input would yield the same result. – Jeremy Friesner Jan 13 '22 at 05:34
11

I don't think so.

There is no easy way to show why it's not possible. If there was, then this would itself be the basis of an algorithm to find collisions.

Longer analysis:

The preprocessing makes sure that there is always at least one 1 bit in the input.

The loop over w[i] will leave the original stream alone, so there is at least one 1 bit in the input (words 0 to 15). Even with clever design of the bit patterns, at least some of the values from 0 to 15 must be non-zero since the loop doesn't affect them.

Note: leftrotate is circular, so no 1 bits will get lost.

In the main loop, it's easy to see that the factor k is never zero, so temp can't be zero for the reason that all operands on the right hand side are zero (k never is).

This leaves us with the question whether you can create a bit pattern for which (a leftrotate 5) + f + e + k + w[i] returns 0 by overflowing the sum. For this, we need to find values for w[i] such that w[i] = 0 - ((a leftrotate 5) + f + e + k)

This is possible for the first 16 values of w[i] since you have full control over them. But the words 16 to 79 are again created by xoring the first 16 values.

So the next step could be to unroll the loops and create a system of linear equations. I'll leave that as an exercise to the reader ;-) The system is interesting since we have a loop that creates additional equations until we end up with a stable result.

Basically, the algorithm was chosen in such a way that you can create individual 0 words by selecting input patterns but these effects are countered by xoring the input patterns to create the 64 other inputs.

Just an example: To make temp 0, we have

a = h0 = 0x67452301
f = (b and c) or ((not b) and d)
  = (h1 and h2) or ((not h1) and h3)
  = (0xEFCDAB89 & 0x98BADCFE) | (~0x98BADCFE & 0x10325476)
  = 0x98badcfe
e = 0xC3D2E1F0
k = 0x5A827999

which gives us w[0] = 0x9fb498b3, etc. This value is then used in the words 16, 19, 22, 24-25, 27-28, 30-79.

Word 1, similarly, is used in words 1, 17, 20, 23, 25-26, 28-29, 31-79.

As you can see, there is a lot of overlap. If you calculate the input value that would give you a 0 result, that value influences at last 32 other input values.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
10

The post by Aaron is incorrect. It is getting hung up on the internals of the SHA1 computation while ignoring what happens at the end of the round function.

Specifically, see the pseudo-code from Wikipedia. At the end of the round, the following computation is done:

h0 = h0 + a
h1 = h1 + b 
h2 = h2 + c
h3 = h3 + d
h4 = h4 + e

So an all 0 output can happen if h0 == -a, h1 == -b, h2 == -c, h3 == -d, and h4 == -e going into this last section, where the computations are mod 2^32.

To answer your question: nobody knows whether there exists an input that produces all zero outputs, but cryptographers expect that there are based upon the simple argument provided by daf.

Community
  • 1
  • 1
TheGreatContini
  • 6,429
  • 2
  • 27
  • 37
  • exactly what I thought. the chance of having a hash that results in zero exists even if we start from some golden value with ones in it, because of modulos everywhere in the calculations. in the end daf simple math should be correct based on his comment "a good hash function property is to be uniform". – v.oddou Dec 10 '15 at 09:15
5

Without any knowledge of SHA-1 internals, I don't see why any particular value should be impossible (unless explicitly stated in the description of the algorithm). An all-zero value is no more or less probable than any other specific value.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • 3
    Yes I guess that is a better way to ask the same question: is there anything in the SHA-1 algorithm which explicitly excludes this value. – mckamey Dec 15 '09 at 01:59
2

Contrary to all of the current answers here, nobody knows that. There's a big difference between a probability estimation and a proof.

But you can safely assume it won't happen. In fact, you can safely assume that just about ANY value won't be the result (assuming it wasn't obtained through some SHA-1-like procedures). You can assume this as long as SHA-1 is secure (it actually isn't anymore, at least theoretically).

People doesn't seem realize just how improbable it is (if all humanity focused all of it's current resources on finding a zero hash by bruteforcing, it would take about xxx... ages of the current universe to crack it).

If you know the function is safe, it's not wrong to assume it won't happen. That may change in the future, so assume some malicious inputs could give that value (e.g. don't erase user's HDD if you find a zero hash).

If anyone still thinks it's not "clean" or something, I can tell you that nothing is guaranteed in the real world, because of quantum mechanics. You assume you can't walk through a solid wall just because of an insanely low probability.

[I'm done with this site... My first answer here, I tried to write a nice answer, but all I see is a bunch of downvoting morons who are wrong and can't even tell the reason why are they doing it. Your community really disappointed me. I'll still use this site, but only passively]

  • With all due respect, this wasn't a philosophical question. – mckamey Jan 21 '13 at 20:19
  • @mckamey What do you mean philosophical? I thought my answer was based on reality. And to people downvoting this. Care to tell what you disagree with and why? – user1947100 Jan 29 '13 at 23:08
  • 1
    By the way I'm pretty sure these assumptions are quite common, for example do you think GUIDs are guaranteed to be unique? Of course not. But you can safely assume so, because the probability of generating the same GUID twice is insanely low (but still much higher than the SHA-1 zero hash, because if you have many GUIDs and you have to compare every pair of them). – user1947100 Jan 29 '13 at 23:18
  • Small nitpick: you can assume that you will not get collisions (the point of the algorithm) only if you are hashing purely random numbers. Obviously, if you repeatedly hash some word, you will always get the same predictable hash back. In other words, the question is valid only if the premise of "a hash that we do not know the original message already" applies. (00..00 being such a hash) – sleblanc Sep 05 '15 at 19:40
-3

Contrary to all answers here, the answer is simply No.

The hash value always contains bits set to 1.

Marius Amado-Alves
  • 1,365
  • 2
  • 8
  • 3