Principles of Password Design – Michael Harrison

Try to guess the most secure password:

a3@Bs!14z21Hb&!@mO89
LazyBumblebeesEvenOrange

Passwords are the face of computer security. Just about everything you use online or off—more often than not—requires an account and an associated password. Ultimately, the goal of that requirement is to simply provide a form of identity verification, asking: “is the person sitting behind the keyboard right now the same person that should have access to this resource?”

Unfortunately, passwords are one of the weakest forms of identity verification, especially as used by the vast majority of the computer-using population. To combat this, websites—particularly those where money is involved—began using password requirements to enforce security when creating a login. While this used to be entirely the purview of banking sites, the proliferation of largely arbitrary, complex, and nonstandard password rules often DECREASE the relative security of the password that the user invents. On top of this, the sites implementing these complex rules often undermine their entire schema by incorrectly managing their database or enforcing additional arbitrary requirements.

For some context here, check out some of the most common passwords:

I hope to illustrate some of the weaknesses of passwords by walking through some of the basic password attack vectors and the elements of cryptography employed in their creation. Hopefully, you can use this information to create more easily remembered, secure passwords.

Password Search Space/Depth

Have you ever considered why websites ask you to include UPPERCASE, lowercase, a number, and a symbol in your most secure passwords?

This is the first principle of password design: the number of permutations, or the “search space” required to brute force your chosen password. You might remember from highschool the principles of combination versus permutation. As you can use the same letter, number, or symbol multiple times and the order of the characters matter, passwords use permutation, not combination. So how can we apply this?

Take your typical four digit PIN: 1111. Let’s list our assumptions:

Each character is a number from 0-9.
There are four characters.
Numbers can be used more than once and can be in any order.

Based on what we know, we can conclude that there are 10^4 (10,000) permutations, where 10 is the number of possible characters (0,1,2,3,4,5,6,7,8,9) and the exponent is the length of the password. Starting at 0000, then 1000, then 2000 all the way to 9999 we can simply generate each permutation of digits, try the PIN, and if it fails, try the next one. This is the essence of brute force password cracking—there is no information to guide the search—we just try each and every permutation until one of them works.

So what happens to our search space when we change the rules? Let’s look again at our four character PIN if our user can also use lowercase letters. There are 26 lowercase letters: with our 10 digits + 26 lowercase letters the number of possible characters is now 36. Our password is still four characters long, so our exponent is still 4, giving us 36^4 or possible permutations. Adding UPPERCASE (another 26 characters), and symbols (another 33 characters) additionally increase our search space.

Numbers: 10^4, 10,000
Numbers+lower: 36^4, 1,679,616, 167 fold increase over numbers
Numbers+UPPER+lower: 62^4, 14,776,336, 8 fold increase over numbers+lowers
Everything: 95^4, 81,450,625, 5 fold increase over numbers+UPPER+lowers

If I were to brute force—generating each possible combination at a conservative 1,000 guesses per second (a speed easily achievable by many graphing calculators) it would take just 10 seconds to crack our numbers-only password, about 30 minutes for our numbers+lowercase, all the way up to about 22 hours for our full keyboard.

It becomes evident why websites like to require increased complexity, forcing you to use UPPER and lowercase, symbols, and numbers when creating your password. Unfortunately, this has the side-effect of forcing the creation of passwords like @Aj8, V&t1Kv4s, or Ba#$J5—passwords that are not only difficult to remember but are missing another crucial piece of the password puzzle.

Knowing what you know now, which of the following passwords will take the longest to guess?

3ED45
3eD45
3ed4%
33445

Given that all of these passwords are five characters long, C has numbers, symbols, and lowercase letters, the largest search space.

KEY: A: 17 hours, B: 1.5 weeks, C: 2.6 weeks, D: 2 minutes

Password Length

For the first element, we focused on the base number of the permutation calculation, starting with 10 and ending on 95. As you might imagine changing the exponent you can change the number of permutations by orders of magnitude. In the previous example, our password was four characters long, what happens to our permutations if we add just one character?

Numbers only: 10^5, 100,000, 10 fold increase over 4 characters

Numbers+lower: 36^5, 60,466,176, 36 fold increase over 4 characters

Numbers+lower+UPPER: 62^5, 916,132,832, 62 fold increase over 4 characters

Everything: 95^5, 7,737,809,375, 95 fold increase over 4 characters

Notice a difference? Unlike the diminishing returns we observe when incrementally increasing the search space at a given length, we see literal exponential increases in the search space by simply adding one or two characters.

Revisiting our graphing calculator brute force attempt (1,000 guesses per second), cracking our everything-password goes from 22 hours with four characters to nearly six months with five, an enormous difference.

We aren’t finished yet: we’ll now look at the final and most important component.

Using similar passwords from the previous section, which of the following passwords will take the longest to guess?

3ED4517
3eD45
3eD4%
3344526

Despite C encompassing the largest search space (an everything password), both A and D are two digits longer. However, despite what you might hear on TV, length is not everything. D (appropriately) only contains numbers—a very small search space—which despite the increase in length leads to a relatively small number of permutations and by far the shortest brute force time. Choice A with a mix of length and search space leads to the longest brute force time of these options by a considerable margin.

KEY: A: 2.5 years, B: 1.5 weeks, C: 3 months, D: 3 hours

Password Entropy

This one is a little more complicated than the previous two. Just like we learned in thermodynamics, entropy is randomness. Therefore, a strong password will have a high degree of randomness that makes it difficult to guess. Essentially, password entropy is a mathematical measure of a passwords “guessability” by a brute force attack. Recall the brute force method: generate each password permutation in order, aaaa > baaa > caaa etc. Assuming that we, as the attackers, know the password generation rules, we can determine the size of the search space and the probability of guessing any password at random. It stands to reason then that a randomly generated password with n characters will sometimes be generated very early in the total search space (if the password was aaaa, for instance) and sometimes be generated very late (in the case of zzzz). On average, a brute force attack will find the password half-way through a given search space. Mathematically, we can represent the number of “bits” of entropy via the base-2 logarithm of the total number of permutations.

To illustrate this, let’s return to our first four digit PIN: 1111. Our total search space was 10^4, or 10,000 permutations. By taking the base-2 logarithm we can determine the number of bits of entropy. 10,000 = 2^n, or 13.288 bits of entropy.

Now what would a password look like with just one additional bit of entropy?

2^14.288 = x, giving us a total number of permutations of 20,000. As you might expect given a base-2 logarithm, one additional bit of entropy doubles the number of guesses required for an attacker to find the correct password.

In a way, entropy is an extension of our first and second principles. Both length and search space factor into the entropy of a password. If entropy is entirely based on the number of permutations and that is based entirely on our first and second principles, why exactly are we talking about this?

The reason is randomness. Humans are ludicrously bad at coming up with random passwords and computers are only slightly better. Unlike the prior two principles, entropy is important due to the social engineering component of brute force attacks, something I will cover further in the future as it is beyond the scope of this discussion.

For now, we’ll focus on what “randomness” actually means in this context. Surely, the password GmKa$5VM is random but in what way? Each character was chosen as randomly as possible from a character space of 95 possible choices, giving about 6.5 bits of entropy per character (see here for how per character entropy is calculated). The idea here is that we can randomly generate passwords with a high degree of entropy; however, these passwords are a nightmare from a usability perspective. Most people have trouble remembering more than a small handful of password, let alone ones that are literally designed to be pattern-free.

But what if we come at this from another direction? Take Diceware for instance, a wordlist that contains 7776 short, English words. If we were to randomly select a word from this list, the possibility of correctly guessing any random word is 1/7776 or nearly 13 bits of entropy per element. Compare this to the probability of guessing any random character: 1/95. As bit strength entropy is additive, we get an entropy of a randomly selected password “aaaa” (6+6+6+6) of 24 bits. Selecting four random words from the Diceware list: “climbharrowjumpalong” gives us a total entropy of (13+13+13+13) 46 bits of entropy entirely independent of length.

Ultimately, entropy is about creating passwords that are unique and difficult to guess.

The Takeaway

We’ve covered three principles of creating a strong password:

Search space: the number of possible things that each character in a password is drawn from,
Length: how many characters you need to guess correctly and in the proper order
Entropy: how unique and difficult to guess your password is, as a measure of the other principles

It is important to remember that none of these principles alone can create a strong password. The word “password” is eight characters long (not a minimal acceptable length, in my opinion) but is also one of the most commonly used passwords on the planet. The a@B4 covers the search space requirement but is only four characters long. However, the password that combines both of the first two elements, “a@B47N%p” is virtually impossible to remember! As Randal Munroe of XKCD points out, we have been training users for decades to create passwords that are both difficult for humans to remember and easy for computers to guess. My online banking site, for instance, requires numbers, letters (both upper and lower case) and symbols, but caps passwords at EIGHT CHARACTERS, almost entirely defeating the purpose of the increased search space.

So how can we create passwords that are both usable and secure?

Passphrases: picking several words at random from a dictionary or creating a unique sentence. Bonus points for incorporating punctuation and numbers for search space considerations.

Site-handing/padding: adding between 5-10 memorable characters to the beginning or end of your password. Ideally, these characters can be generated from some aspect of the site you can easily remember.

The world of IT security is enormous and these techniques only scratch the surface of things that you should be considering when keeping yourself safe online. I hope to cover more of the social engineering aspects in a future DYK, but for now, think about your passwords? How long would this take to crack on my graphing calculator, let alone my GPU used for video gaming, or the supercomputer used by a motivated nation-state? You can put it to the test and find out.

For some further reading, check out this paper from Microsoft about common user password creation habits. You’ll probably be surprised (or not) to find out how users create passwords.

Now, after all of that, which of the passwords above is the most secure? At this point, you’re probably unsurprised to learn that password 2—LazyBumblebeesEvenOrange—is orders of magnitude more secure than password 1. Just don’t go using it anywhere!

Happy security!