47

In xkcd comic #936, Randall Munroe claims that passwords like "Tr0ub4dor&3" (uncommon base word, caps, common letter substitutions with a number and punctuation suffix) has ~28 bits of entropy, while taking four random common words, like "correct horse battery staple", has ~44 bits of entropy, and is therefore much much stronger.

XKCD Mouseover: To anyone who understands information theory and security and is in an infuriating argument with someone who does not (possibly involving mixed case), I sincerely apologize. I am confused because I've always been told that having numbers, cap/non-cap letters and special characters was essential for a strong password...

Is XKCD right on this?

Evorlor
  • 845
  • 1
  • 7
  • 12
  • I've heard the same claimed elsewhere. –  May 27 '17 at 17:02
  • 16
    I think this would be more on-topic for Math.SE or Security.SE. It needs to be answered with a mathematical calculation rather than empirical evidence. – Nate Eldredge May 27 '17 at 17:43
  • 70
    This would be a duplicate on [Security.SE](https://security.stackexchange.com/q/6095/61898). – Brythan May 27 '17 at 18:45
  • 4
    @Brythan: Even better then, it means it's already been answered! – Matthieu M. May 27 '17 at 19:17
  • 66
    Not to mention it's [the highest voted question on Security](https://security.stackexchange.com/questions?sort=votes). -1 for serious lack of research effort (and low usefulness given how much higher quality the answers from the appropriate site are / can be). Probably the rare case for supporting [this](https://meta.stackexchange.com/questions/16703/the-ability-to-link-cross-site-duplicates), too. – Jason C May 27 '17 at 20:29
  • 2
    Worth noting that this math applies if you stick with the assumptions made, but the entropy can be much higher if other common habits were included, e.g. using made-up words, proper nouns, two-word combos, inserting extra characters rather than doing 1337-substitutions, etc.. That said, xkcd's main point was to promote more human-friendly passwords. – Nat May 28 '17 at 00:59
  • 18
    I'm voting to close this - it is Security Stack Exchange's top voted question and all the specifics have been covered there by top cryptographers and security professionals. That should be the canonical post. – Rory Alsop May 28 '17 at 07:33
  • There are two rather conflicting criteria for a "strong" password: 1) Difficulty of "guessing", using knowledge about the individual. 2) Difficulty of finding with an exhaustive search of all possible combinations. Most "rules" for passwords seek to strengthen password with regard to the first criterion, but those "rules" tend to weaken it with regard to the second criterion. And in the end there's no way to prevent weakening due to stupidity. – Daniel R Hicks May 28 '17 at 17:51
  • 1
    The problem with this kind of thinking is that "choosing a password" is irrelevant. You need to solve the problem of choosing hundreds of passwords, one for each site. Neither of these systems scales (can you really recollect hundreds "correct battery horse staple" passphrases?) – Sklivvz May 28 '17 at 18:06
  • 2
    https://www.explainxkcd.com/wiki/index.php/936:_Password_Strength has an interesting discussion about this. – Roman Gräf May 28 '17 at 18:38
  • 2
    Ever heard about "threat models"? This question is nonsense, and is about as answerable as "what is heavier — apple or oranage"? "Security" is not quantifiable, it is abstract. Btw, xkcd is *satire*. It is meant to make fun of misconceptions like these. – user1643723 May 29 '17 at 00:07
  • 3
    I disagree that "answered on another SE site" is a valid reason to VTC - each site garners different answers for different audiences - The Skeptics answers here do not need go deep into the technical detail like the Sec.SE post - instead they should reference multiple studies, published papers and the like, including *but not limited to* the Security.SE post, and draw a conclusion as to the *accuracy of xkcd's claim*, as per the [Tour]: *"Ask about...the accuracy of [public claims](http://meta.skeptics.stackexchange.com/questions/864/28182) made in the media or elsewhere"* – Robotnik May 29 '17 at 00:28
  • 1
    @Robotnik: Problem is, this question boils down to "Give me sources that 44 is greater than 28", and that's just an elementary mathematical fact. The question could have asked whether the entropy calculations are correct, but didn't, and that still would be a topic for information theory, not Skeptics. Skeptics really is best for revealing the supporting argument when a claim is given that separates a conclusion from its argument. That's not the case here, the argument and conclusion are in the same context. – Ben Voigt May 29 '17 at 18:13
  • 1
    See [my Dicewords generator](https://CJSHayward.com/passwords/). Note that it may produce weaker passwords from relying on built-in pseudo-randomness rather than a greater source of entropy... – Christos Hayward May 29 '17 at 19:38
  • @Ben - You're assuming password length is the only thing that matters, it's not. If everyone in the world used 4 simple random words then attackers would simply start making tools that assume you string together 4 simple random words, making your password only slightly stronger than a 4-character-long password. Point is, there's *plenty* of studies and opinion in this space and is most certainly not enough to simply say "44 > 28." – Robotnik May 29 '17 at 23:32
  • 1
    Vote for the [crossover questions feature request](https://meta.stackexchange.com/q/199989/191265) if you would be interested in seeing questions shared across sites, rather than having cross-site duplicates. – Thunderforge May 30 '17 at 00:20
  • @Robotnik: Considering that several human languages have a character for each concept we'd describe as a word, a 4 character long password is exactly what that is. But a 4 Chinese character password is potentially very strong. Anyway, if you're going to challenge the entropy computation, you need to be asking on a site where information theory is on topic. – Ben Voigt May 30 '17 at 02:15
  • 1
    _"I've always been told that having numbers, cap/non-cap letters and special characters was essential for a strong password"_ Yes, that is the entire premise of the comic. – Lightness Races in Orbit May 30 '17 at 09:41
  • @user1643723: _"Btw, xkcd is satire."_ No, it isn't. _"It is meant to make fun of misconceptions like these."_ No, it isn't. – Lightness Races in Orbit May 30 '17 at 09:43
  • 1
    This has been earning flags arguing it is a duplicate of a [Security.SE question](https://security.stackexchange.com/questions/6095/xkcd-936-short-complex-password-or-long-dictionary-passphraseWhich). Our normal practice is to allow such duplicates, as the community standards for answering is very different between the sites. If you disagree, please raise a meta-question discussing the policy, so it can be reconsidered. – Oddthinking May 30 '17 at 17:56
  • Why is this notable? The whole point of the contortions one goes through to create a "strong" password is to have something that will be both (A) memorable to the user and (B) difficult to hack/steal/guess/crack. Yes, four completely random words would probably make a very, very strong password. So strong that the user probably would not be able to remember it, making it somewhat useless as a password. – PoloHoleSet May 30 '17 at 21:54
  • @robotnik The average human has a vocabulary of 20K-35K words. Assuming the low end, 64^10 > 20,000^4 > 64^9 so even in the worst case a four-random-word password would have better entropy than a nine-random-character password... And additionally would be *significantly easier to remember and type* -- and the latter quality becomes especially useful when you have to enter passwords on a touch screen of a tablet or smartphone. – Shadur Aug 26 '21 at 10:44
  • @Robotnik ... Plus, even that is assuming that the attacker has correctly guessed *which language the target speaks*. – Shadur Aug 26 '21 at 10:45

3 Answers3

65

It is true that you do not need numbers, special characters, etc for a strong password. If you instead increase the length of the password, the entropy will increase as well. See for example this entropy table. To get 64 bit of entropy, you could have a 14 character lowercase password, or you could have a 10 character password with all printable ASCII characters, or you could have a passphrase with 5 words randomly selected from a list of 7776 words (Diceware).

The XKCD approach is also called a passphrase or - if done correctly, ie randomly selected words in the passphrase - as Diceware.

The math was checked at security.SE and is approximately correct if we assume a small word list for diceware (~2000 words) and - and this is crucial - randomly selected words. Do note that the XKCD comic assumes randomness for the words in the diceware passphrase, but assumes a pattern for the password (which is a fair assumption and also the point of the comic, as nobody remembers a truly random 11 char password).

Note that people are bad at random selection, so the actual difficulty to guess the password when not using dice to generate it - and accepting the first result - will be lower.

The claim about memorability was examined in the paper Correct horse battery staple: Exploring the usability of system-assigned passphrases (note that the paper uses correctly generated passphrases instead of relying on users "randomly" choosing passphrases):

Contrary to expectations, system-assigned passphrases performed similarly to system-assigned passwords of similar entropy across the usability metrics we examined. Passphrases and passwords were forgotten at similar rates, led to similar levels of user difficulty and annoyance, and were both written down by a majority of participants. However, passphrases took significantly longer for participants to enter, and appear to require error-correction to counteract entry mistakes. Passphrase usability did not seem to increase when we shrunk the dictionary from which words were chosen, reduced the number of words in a passphrase, or allowed users to change the order of words.

tim
  • 51,356
  • 19
  • 207
  • 177
  • 15
    I guess this shows that the simplest way to have strong individual passwords is to generate them randomly, and them promptly forget about them and count on your password manager to remember them for you... – Matthieu M. May 27 '17 at 19:20
  • 11
    @MatthieuM. Yes, a password manager is really the only good solution for web passwords, but it isn't practical in all situations (eg for full disk encryption). And you will of course also need to remember the password for the password manager as well as your OS user password (depending on your needs, it may - or may not - be a viable solution to write that down though). – tim May 27 '17 at 19:26
  • @tim Full disk encryption is not a problem. Accounts these days are shared between devices, so your master password database is stored in the cloud. The database's encryption, plus the service's like SpiderOak, protects it. You still need to remember two strong passwords, one of the cloud service and one for the password database, but that's much better than N strong passwords. Using a password manager in this manner means you don't lose your passwords if your disk fails or your device is stolen or you can't unencrypt the disk. And you always have a mirror on your devices if the cloud fails. – Schwern May 27 '17 at 22:26
  • 4
    @Schwern I meant it's a problem because you can't save your disk encryption password in the password manager. I don't see how cloud services would help here; they would just add a fourth password you need to remember (OS user, disk, password manager, cloud service). I guess you could save your disk encryption password in a password manager in the cloud and access it via other devices, but that would add new attack vectors. As protection against losing the password manager db, cloud services would help, but so would safely stored backups. – tim May 27 '17 at 22:34
  • @tim Yes, you save your disk encryption password in the password manager and get it via another device like a phone or keep a backup. The biggest risks here are not technical, but on the user: forgetting to backup, and not to using the password manager because it's inconvenient. The risk of the cloud is mitigated by the cloud password, the databases' own encryption, and using cloud storage with end-to-end encryption like SpiderOak. You gain being able to use a password manager conveniently for everything, and not being vulnerable to forgetting to backup, or your home-rolled backup failing. – Schwern May 27 '17 at 22:41
  • @tim Basically, so long as you have a recently synced backup of the database somewhere, whether that's on another device, or in a personal backup, you only need to remember one password: the password for the password manager. From there you can bootstrap everything. I've done it many times. – Schwern May 27 '17 at 22:44
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/59423/discussion-between-tim-and-schwern). – tim May 27 '17 at 22:48
13

There is no single right answer to how much entropy a password has: the result will depend on the assumptions the attacker will make about it, and these are unknown. More or less reasonable guesses can be made about these assumptions, giving more or less reasonable entropy values.

This article at explainxkcd covers the comic in question. It explains the assumptions which were made to justify the calculations in the comic. These calculations are rather pessimistic (i.e. the attacker is supposed to know the exact password layout beforehand), and assume passwords based on dictionary words, so they are underestimating the entropy of both passwords a little.

For the reference, password strength test finds that "Tr0ub4dor&3" has 52 entropy bits (instead of 28 in the comic), while "correcthorsebatterystaple" has 94 bits (instead of 44). This estimation assumes that the attacker will use the letter pair combinations dictionary of the English language. As you can see, the results are quite different, but the claim made in the comic holds for this calculation method too - having a longer common-word password is better than having a shorter password with special characters.

Dmitry Grigoryev
  • 2,427
  • 12
  • 21
  • 1
    Like you say, it depends on the assumptions the attacker will make. The [zxcvbn strength calculator](https://dl.dropboxusercontent.com/u/209/zxcvbn/test/index.html) gives "Tr0ub4dor&3" only 36 bits of entropy, while "correct horse battery staple" gets 66 bits. – Mark May 31 '17 at 01:22
  • @Mark This still seems like an underestimation to me. For example, "correcthorsebatterystaple" is *known* to be a 4-word sentence. In reality, the attacker will probably check 3- ,2- and 1-word sentences, brute-force all passwords of 8 characters or less and try a few other common password schemes before they even start cracking 4-word passwords. – Dmitry Grigoryev Jun 07 '17 at 13:54
0

Your problem is with this assumption:

I am confused because I've always been told that having numbers, cap/non-cap letters and special characters was essential for a strong password...

This statement is theoretically true, but false when it comes to how people actually create passwords. Simply replacing an o with a 0 and an a with a 4 does not make your password stronger. At best, the new password has equivalent strength. In reality, it is likely weaker because these substitutions are so popular that an attacker may try 0 and 4 before even trying o and a. Similarly, capitalizing the first letter is so natural that it doesn't add anything to the security. And adding a digit at the end is also a popular choice that increases the password complexity just a tiny bit, even with the extra & character added. Even typos (such as troubador vs troubadour) don't throw off a dictionary attack by more than a tiny bit.

As an aside, NIST has recently moved away from this recommendation (as well as from regularly changing passwords and a number of other common password policies). In the real world, both served to undermine rather than enhance security.

Having digits and special characters only enhances security if you choose each character individually at random, simply because an attacker has to guess each character among 90 or so characters. If you only use letters, an attacker has to guess only guess among 26 or 52 characters (depending on if you use a mix of upper and lower case).

In the case of Tr0ub4dor&3, those characters aren't random. This password is likely trivial to crack with a dictionary attack. Most people have an active vocabulary of around 1000 words or so, educated people of around 6000 or so (in English). These 6000 words would likely include troubadour. An attacker with a dictionary attack will of course try all the permutations of the word. So that's maybe 100 different versions of the word "troubadour".

Further, an attacker would try the most likely permutations first.

So an attacker may well try "Tr0ub4dor" before even trying the base word "troubadour". This word may be number 3000 in the password dictionary an attacker would try (although after the XCKD, it probably moved to number 20), so an attacker would have cracked your password in a few thousand to a few hundred thousand attempts - which may be a few minutes to a few hours with a decent computer.

Kevin Keane
  • 305
  • 1
  • 5