Perl Encryption Primer: Passwords
2014-04-01
We’ve covered most of the major building blocks of encryption and how you can use them in Perl. With those building blocks in mind, we can cover one of the most common uses of encryption that a programmer will encounter, which is how to handle passwords.
Password handling has a long history, most of it bleak. Many systems started without any passwords at all; keeping the computer physically locked in the room was considered good enough. As time went on, the machines got passwords, but CPU time was still expensive, and programmers hadn’t thought much about the security implications of storing passwords in plaintext. And that is what they did.
Still more time passes, and somebody figures that having a big database of passwords in plaintext isn’t such a good idea, so the crypt()
function on Unix is used to encrypt things. This was originally based on DES (which, as we know from the block cipher post, is no longer good enough), and it truncates your password to 8 characters of 7-bits each. It then uses that password with a 12-bit salt to encrypt a string of all zeros.
What’s a salt? It’s a random value appended to the string, which should be unique for each user. We use these to thwart an attack called a Rainbow Table. Here, an attacker knows that you don’t use a salt, or knows the salt you use on all your passwords. With that information, the attacker can precompute a list of passwords and their assoicated hashes.
In the example below, we assume the same salt (“12345”) is used for crypt()
(like much of the standard C library, this function is available to us directly in Perl). We start with the password “aaaaaaaa” and iterate through a few possibilities using the magic string mode of the ++ operator:
$ perl -E '$p = "aaaaaaaa"; for (1..5) { say $p . ":" . crypt( $p, 12345 ); $p++; }'
aaaaaaaa:12Tez3EWIho0.
aaaaaaab:12kdu.nuXXcPQ
aaaaaaac:12XbtbeyxB4bQ
aaaaaaad:12s3LAvuhOz9E
aaaaaaae:12CP4cko54fzk
Combined with a dictionary of common passwords (and variations like “p4ssw0rD”), the attacker doesn’t need to brute force each password individually. They just need to match the hashed value to the plaintext value. With a bit of googling, you’re sure to find textfiles of precomputed Rainbow Tables, so the attacker doesn’t even have to do the hard part themselves.
Getting back to crypt()
, we once again note that it forces an 8-character password of 7-bits per character (plain ASCII). This would be 8 * 7 = 56 bits, which is the DES key length. We already know that this is inadaquate against modern brute force attempts. On top of that, even security-concious users don’t use passwords with the entire ASCII range, and many applications will break if you tried (ever use a password with a vertical tab in it?). That puts further limits on the size of the search space.
In short, crypt()
should be completely avoided.
The next generation used a cryptographic hash. The advantage of this method is that the original plaintext password can’t be feasibly recovered, even by the people who own the database or write all the code. (If a web site’s password recovery system ever emails you back your original plaintext password, that means they didn’t do it right.)
Doing things this way usually started with MD5:
$ perl -MDigest::MD5=md5_hex -MCrypt::Random=makerandom -E '$salt = makerandom( Size => 32 ); say $salt . ":" . md5_hex( $salt . "foobar" )'
3892167202:c3d2ca5074af216141d567d1b3aebcbd"
As weaknesses in MD5 were exposed, things moved to SHA1 and others, but it was the same idea. When the user wants to be authenticated, we hash the provided password with the same salt (the part before the colon above) and check if we get the same value as our stored hash.
One weakness happens with Timing Attacks. With the standard string eq comparison (or equivilent in most other languages), a string is matched by going through character-by-character to see if each one matches in turn. It fails as soon as it hits a non-matching character.
This means that "aaaaab" eq "aaaaac"
takes slightly longer to fail than "aaaaab" eq "aac"
. Mallory can use this slight difference in speed over many different requests to see if he’s getting closer to the right password, somewhat like the Mastermind game. This fail-fast property helps keep programs speedy, but it’s a problem here.
You might think that this difference in speed would be too small to be practical over the Internet, but in fact it was sucesssfully done against OAuth and OpenID.
Mallory’s attempt is complicated by salt and hashing, but making the system completely immune is not difficult, so why not just do it? Here’s how:
sub match_strings
{
my ($self, $str1, $str2) = @_;
return 0 if length($str1) != length($str2);
my @str1 = split //, $str1;
my @str2 = split //, $str2;
my $does_match = 0;
foreach (0 .. $#str1) {
$does_match |= $str1[$_] ^ $str2[$_];
}
return $does_match == 0;
}
We start by checking that the strings are the same length. If we were matching plaintext passwords, this would tell Mallory that he needs a longer or shorter attempt. Note that hashed passwords of the same encoding type will always match in length.
Next, we split the strings into individual characters and iterate over them. With each loop iteration, we XOR the same location in the strings with each other. We then OR that with the current result. If this result ends up staying zero all the way to the end, the strings match.
So far, we know that we should use an irreversible method of encryption, we should use a salt from a cryptographic random source, and we should match the hashed strings in a way that does not fail-fast. Up until a few years ago, that would have protected you against any practical attack out there.
As tends to happen, something new came along. Graphics Processing Units were being used in more and more general computing tasks, and it was found that a GPU could be used to crack hashed passwords. The massive parallel processing speeds offered by GPUs allowed all our then-best schemes to fail.
The mistake we made is that general cryptographic hash algorithms are meant to be fast. Not so fast that they tradeoff security, but all things being equal, a fast algorithm is obviously better than a slow one. Trouble is, if it’s fast to hash a password on your end, it’s equally fast for an attacker to do it on their end.
We are saved here by the fact that a user logging in to a system generally doesn’t mind if it takes an extra second for that one request. But for an attacker making millions of attempts to brute force a password, that extra second adds up. Since we get to choose the algorithm, we should choose one that’s delibrately slow, perhaps one that we can tune to make slower as Moore’s law picks up.
This is where bcrypt comes in (so called because it’s a variation on the Blowfish block cipher). With bcrypt, there is a “cost” parameter that can be tuned upwards as Moore’s Law makes CPUs and GPUs faster. This parameter controls how many loops the algorithm will go through before producing the final output.
The advancements don’t stop with bcrypt, either. GPUs are good at raw processing, but terrible when moving things in and out of memory. So if you use an algorithm that forces you to move things in and out of memory a lot, Mallory can all but give up on cracking passwords with GPUs. This is what scrypt does. The algorithm is still a bit new and hasn’t been completely vetted yet by the cryptographic community, but it may become the new gold standard for storing passwords.
How should we implement all this? One observation from all the above is that for a few decades now, there’s been a war between new attacks and new ways to thwart attacks. We don’t appear to be anywhere near the end of it. That means we should code our apps in a way that can absorb a new algorithm easily. If tommorow morning, bcrypt turned out to be totally broken, you should be able to switch to scrypt in production by early that afternoon at the latest.
When we store the password, we need to store something that shows how it was encrypted. We then have a configuration value somewhere that lists the preferred encryption method for passwords. For algorithms that have cost parameters, like bcrypt and scrypt, we also need to encode that information and configure it alongside the encryption type.
Fortunately, Authen::Passphrase gets very close to doing everything right. Using a string representation based on RFC 2307, it encodes the encryption type, salt, any cost parameters, and encrypted string into a single compact value. The one thing it doesn’t do is match strings in a way that protects against timing attacks.
I also think it could benefit from a plugin system. The current implementation uses lexical hashes within the package to map encryption types to the implementing class, with no way of adding to those hashes outside of the lexical scope (at least, not that I’ve found). New encryption types could be added by CPAN dists more easily if there was a way to add new types dynamically. By breaking out existing classes into separate dists with a plugin system, it could also fix a criticism in one of its CPAN reviews, which is that it’s a maze of twisty dependencies.
Regardless of the above, this is still the best CPAN module I’ve found for storing passwords in Perl.
Let’s say we have a new user signed up with their passphrase in $passphrase
. Do this:
# These should come from a general configuration
my $PASSPHRASE_CLASS = 'Authen::Passphrase::BlowfishCrypt'; # AKA bcrypt
my %AUTHEN_PARAMETERS = (
salt_random => 1, # Creates a good salt for us
cost => $BCRYPT_COST, # Tweak as necessary
);
my $auth = $PASSPHRASE_CLASS->new(
%AUTHEN_PARAMETERS,
passphrase => $passphrase,
);
my $hashed_passphrase = $auth->as_rfc2307;
We now save the $hashed_passphrase
string into our user database.
Later, the user comes back and wants to login. With our stored encrypted password in $db_passphrase
, and the incoming passphrase to be checked in $passphrase
, we first check if the password matches. If it does, change the encryption if necessary:
my $matched = 0;
my $auth = Authen::Passphrase->from_rfc2307( $db_passphrase );
if( $auth->match( $passphrase ) ) {
if( ref($auth) ne $PASSPHRASE_CLASS ) {
my $new_auth = $PASSPHRASE_CLASS->new(
%AUTHEN_PARAMETERS,
passphrase => $passphrase,
);
my $new_db_passphrase = $new_auth->as_rfc2307;
# Save $new_db_passphrase back into your database for this user
}
# Also check for bcrypt cost, but be sure not to assume bcrypt is used. Left as an
# exercise to the reader.
$matched = 1;
}
# return $matched for successful/unsuccessful login
That covers the technical end of things. What is sometimes overlooked in these articles is the political game. If you’re working on an existing system with an outdated password scheme, how do you convince your boss that this is important and needs to be fixed?
It’s common for managers to say “Well, that only matters if someone had already broken into our system, so we’ll just protect against that”. Now, you might recall our earlier discussion on security as a chain vs. security as layers. If you’re relying only on boarder firewalls to keep the evil out, then breaking just that firewall gets the attacker everything. If the attacker also has to contend with breaking an encrypted password database, you now have security in layers. Layers are Just Better.
Explaining this to the more clueless managers still won’t be enough, so instead, point out some high-profile breakins. Sony was broke into a while back, and it was found that their password database was in plaintext. Gawker also had a data breach, and their passwords were using the outdated crypt()
. These are two well-known tech companies who were utterly embarrassed by their password handling practices.
Perhaps there were a few employees at both those companies who wanted to change things, and perhaps they were stopped by a manager who thought that a data breach would never happen to them.
Don’t let this be a reason for doing nothing, but given the sad state of the industry, I suspect that if you’re currently using salted passwords with SHA1, you’re ahead of far too many companies out there.
We’re almost finished with this series, but there is one more thing I’d like to discuss because there’s so much bad information out there: Quantum Computers and Quantum Encryption. That’ll be in the next post.