Perl Encryption Primer: One Time Pads Are Too Awesome To Use
2014-03-19
Part of my reason for writing this series is not just to provide a solid foundation in cryptography, but also to point out common bad advice. I ran into one such example on /r/bestof some months ago, and it’s stuck in my mind since then.
First, what is a One Time Pad? Start with a bunch of random numbers, which will be your key. Now use each byte of the key to modify each byte of the plaintext. There are a few different ways we could make that modification. Since we’re using a computer, the XOR operation makes a good choice.
What’s so special about XOR? It’s because it’s a reversible operation. Consider:
$ perl -E 'say ord("A")'
65
$ perl -E 'say ord("A") ^ 10'
75
$ perl -E 'say 75 ^ 10'
65
When we XOR a piece of data with a number, and then XOR the output with that same data, we get the original number. How can we use this for encryption? We’ll use a random number (which will form our key) and XOR that with the plaintext:
$ perl -MCrypt::Random=makerandom -E 'say makerandom( Size => 8, Strength => 1)'
172
$ perl -E 'say ord("B") ^ 172'
238
$ perl -E 'say chr(238 ^ 172)'
B
Let’s say Alice and Bob share a key, which is 172. Alice sends Bob the ciphertext 238. Bob knows to XOR this ciphertext with 172, which gets back the UTF-8 character for “B”. So congradulations to Alice and Bob–they’ve successfully sent the letter B to each other, and nobody else will ever know.
If Alice and Bob want to send a longer message while maintaining security, they’ll need a longer key. We’ll pretend that Alice and Bob only want to send plain ASCII messages, which means we only need to generate a key of 8 bits per character. They’ll want to send a 25 character message, so we ask Crypt::Random for 25 bytes of random data:
$ perl -MCrypt::Random=makerandom -E 'say makerandom( Size => (8*25) )'
1023627938326893352258393001582434381024900569975656239941143
That big long number is now our key. We’ll now use the Crypt::OTP module to encrypt our plaintext message, and encoding the output in base64 so we don’t have issues with unprintable characters. We’re also dealing with a really big number, so we’ll need the help of Perl’s bigint library.
$ perl -Mbigint -MCrypt::OTP -MMIME::Base64 -E 'say encode_base64 OTP( 1023627938326893352258393001582434381024900569975656239941143, "My grandma makes good pie", 1 )'
fEkSVERTWV1eWRNfV1NcQBNSXV1RGENQVg==
(The last ‘1’ parameter to OTP()
tells Crypt::OTP to use the string as-is, rather than treating it as a filename.)
To decrypt, Bob undoes the base64 encoding and passes Crypt::OTP the same key:
$ perl -Mbigint -MCrypt::OTP -MMIME::Base64 -E 'say OTP( 1023627938326893352258393001582434381024900569975656239941143, decode_base64("fEkSVERTWV1eWRNfV1NcQBNSXV1RGENQVg=="), 1)'
My grandma makes good pie
Notice that unlike most every other algorithm you’ll find in the Crypt namespace, OTP does not have separate encrypt/decrypt subroutines. Since Crypt::OTP uses the XOR operation as we showed above, encryption and decryption use the same subroutine. It’s not strictly necessary to implement OTP this way–it could also be done with addition and subtraction–it’s just the common way to do it with computers.
Alice and Bob have now shared a very important secret about pies, which they can guarantee will never, ever be read by anyone else, provided they’ve taken all the right precautions. It turns out that those precautions are a daunting challenge, which we’ll get to in a moment. For now, let’s see why we can talk so absolutely about the security of OTP (at least in theory).
We’ll bring in another character that’s commonly used in these crypto stories, Mallory. Mallory is a bad guy trying to read or otherwise screw up messages sent by Alice and Bob. Mallory knows that the message is 25 bytes long, so he generates a bunch of 25 byte keys and what they would decrypt. He sees among the messages:
- “The snipers are not ready”
- “Imtrappedincinncinatiohio”
- “Pecans taste horribleXXXX”
Ignoring any real-world context, each of these messages is equally likely. How can Mallory know which is correct? He can’t. He can brute-force a list of all 25 character ASCII messages, but will have no idea which is the right one. The right message will be in there, somewhere, but he has no reason to say it’s the right one.
As the last example shows, Alice and Bob could use padding to conceal the length of the right message, so Mallory only knows that the message is no more than 25 characters long.
Now that I’ve hyped up OTP as the ultimate encryption method, I want you to forget all about it for any purpose besides winning arguments on the Internet.
The problem comes in the provision we made earlier, that Alice and Bob have taken all the right precautions. It turns out that OTP is almost impossible to use in the real-world, for all sorts of technical and human issues. For these reasons, it is almost never used in practice. The theoretical advantages are a tempting siren song that we must ignore.
First, how does Alice transfer the initial key to Bob? If Alice can deliver it in person, that’s great, but if Alice and Bob and meet regularly and be sure of the security of the data being passed, why not pass the whole message? If Alice sends things by courier instead, Mallory could bribe the courier and copy the key.
If Alice encrypts the key with another cipher, we now run into the problem mentioned in the post on random numbers, which is building security in a chain. OTP is Perfectly Secure, and any other cipher will not be able to live up to that perfection. The other cipher is now part of your security chain; the key is no more secure than the cipher in question. So why not encrypt the message with that cipher instead?
(Key transfer is also a problem for block ciphers, but since their keys are fixed-length, they’re much easier to manage through other means. We’ll talk about transferring them around when we get to public key crypto.)
Let’s say that Alice and Bob have worked out all the issues with transferring the key. Things are going well for a while, but Alice notices that they’re almost to the end of the key. What to do?
The naive thing to do would be to reuse the key from the beginning. This is a mistake which makes it trivial for Mallory to break the key. Let’s say Alice uses the key 0xABCD with 0x1234, and later uses the same key for 0x5678. The results are:
$ perl -E 'say sprintf q{0x%X}, (0xABCD ^ 0x1234)'
0xB9F9
$ perl -E 'say sprintf q{0x%X}, (0xABCD ^ 0x5678)'
0xFDB5
Mallory isn’t quite sure what the first message had, but he’s pretty sure he knows what the second is. Knowing this, Mallory can XOR the second ciphertext with what he knows to be the plaintext, resulting in:
$ perl -E 'say sprintf q{0x%X}, (0xFDB5 ^ 0x5678)'
0xABCD
Which is the original key, which means Mallory can decrypt the first message, too! This is called a “known-plaintext” attack.
How would Mallory figure out the plaintext? There is invariably some underlying structure to the data being encrypted. English language text will have “the” appear quite often, among other things. A transaction to your bank might have the account information in the same location of a packet. Or Mallory could bribe a teller to get Alice’s bank account number.
It’s dangerous to assume that Mallory would have zero knowledge about what you’re sending, which is why cryptographers generally avoid algorithms with known-plaintext attacks. Since there are algorithms out there that don’t have this problem, why not use one of them?
So Alice and Bob have worked out the problems of transferring the key around, and they are hyper-paranoid about concealing anything that might lead to revealing the internal structure of the message. They’ve still forgotten something, which is how to generate the key in the first place.
Remember from the earlier discussion on random numbers that getting true randomness on computers is very hard. Getting really long strings of high-quality random numbers is even harder.
In Neal Stephenson’s book Cryptonomicon, a woman in WWII Britain is charged with generating random characters for use in OTP encryption. She is given one of those lottery machines with all the balls being blown around inside. Each ball has a letter on it, and she’s supposed to reach in, type the letter out on the ball, then stick it back in and draw another.
The problem is that she speaks English, and an English speaker will tend towards preference for E’s and T’s, and not very many X’s and Z’s. So she often threw the X’s and Z’s back in when she thought she was seeing them too often, and maybe added an E or T when she thought there weren’t enough. This biased the random number generator, allowing a smart German cryptographer to break the whole British OTP system. (It should be noted that this is a fictional story based around actual events and cryptography.)
All cryptosystems need good random numbers, but most of them don’t need very many. AES will do fine with only 128 or 256 bits, and it’s safe(-ish) to reuse the same key. An 8,000 bit document encrypted OTP requires an 8,000 bit key, and you can never use those random bits again.
What it all comes down to is that OTPs should almost never be used. There is one notable exception, which we’ll cover when we get to quantum encryption. We also have good reason to think that 256-bit block ciphers are more than adequate no matter what improvements in conventional computing come down the pipeline (or 512-bits against quantum computers). We’ll see why in the next post on block ciphers.