Wednesday, April 27, 2011

Cracking The Vigenere Cipher

I wanted to use Vigenere as the cipher system for the 3rd message on the competition to win a domain name, but changed my mind due to the number of online tools that might help decode the message easily even if you know nothing about how Vigenere works.

Before you start reading about how deciphering Veginere works, I invite you to take a look how Vigenere is used to cipher messages from the Wikipedia article here, in fact Vigenere was called “le chiffre indéchiffrable” or the undecipherable cipher, because any means of cryptanalysis invented before it was defeated, till Charles Babbage found a clever, very clever, way to crack it.

After finishing reading the Code Book the last week, I started decoding the different enigmas proposed at the end of the book, and yesterday I started deciphering the 4th enigma which is a Vigenere cipher (that I finished yesterday too), and I find it pretty amazing for starters to try. I must notify you that this short article will contain the solution to the enigma.

I chose to go old way while deciphering this, finding the key by hand, then decoding the message step by step.

0. The message to decode :

K Q O W E F V J P U J U U N U K G L M E K J I
N M W U X F Q M K J B G W R L F N F G H U D W
U U M B S V L P S N C M U E K Q C T E S W R E
E K O Y S S I W C T U A X Y O T A P X P L W P
...

The complete message can be found here.

1. Finding the length of the key :

Babbage’s method for finding the key length was that the repeated text in the ciphered message is probably produced from the same repeated letters from the clear message that are ciphered with the same part of the key. Hence finding an estimate between the repeated series of letters in the ciphered message would give us a good hint about the length of the key

Clear : T H I S I S A C L E A R T E X T T H I S C A N B E E A S L Y D E C
Key :   C O D E C O D E C O D E C O D E C O D E C O D E C O D E C O D E C
Crypt : V V L W K G D G N S D V V S A X V V L W E O Q F G S D W N M G I E

As you can see the series of latters T,H,I is ciphered to the same series V,V,L, because it was situated at the same relative position according the key, also E,A is ciphered to the same S,D.
The key length is at worst the spacing between these repetitions in the ciphered text, and usually it is the greatest common divisor of the spacing of (most) the repetitions (why? give it a second thought.).


285181560
This is the table of the distances from the Vigenere message (done by hand)

Anchor
Repetition
Distances
Divisors
WXIZAYG 2 190
1 | 2 | 5 | 10 | 19 | 38 | 95 | 190
EFVJ 2 220
1 | 2 | 4 | 5 | 10 | 11 | 20 | 22 …
WUU 3 95
1 | 5 | 19 | 95
EEK 3 200
1 | 2 | 4 | 5 | 8 | 10 | 20 | 25 …
UUN 2 130
1 | 2 | 5 | 10 | 13 | 26 | 65 | 130
MEK 2 135 ..
EKJ 2 515 ..
UXF 2 135 ..
JBJ 2 60 ..
GFB 2 30
1 | 2 | 3 | 5 | 6 | 10 | 15 | 30
PNT 2 35
 5 | 7

As you can see, all the repetitions have 5 as a common divisor, so we will suppose that the key length is 5.

2. Breaking the text into (key-length) sets:


Now that we know the key length, we will make sets from the coded text, so that we group every letter codes with the first letter of the key together, the second group will be the one coded with the second letter of the key etc…

C O D E

T H I S
I S A C
L E A R
T E X T
T H I S
C A N B
E E A S
L Y D E
C

Each group is ciphered with one letter of the key, and better yet, the resulting group is ciphered with a simple Caesar cipher. (notice that we group the ciphered text, not the clear one –do we have the clear one anyway?-)

Here is a sample code that takes a ciphered text and a key length and returns a list of simple Ceasar ciphers from the Vigenere cipher (C# code):


3. Deciphering the Ceasar groups:


Now that we have the groups from the Vigenere cipher, it’s very easy to decipher each one apart (each one has a 26 possibility, so a key length of 5 gives us a 26^5 case to try).

But we can do better with frequency analysis, we need only to locate the Caesar key for each group, so if the text if for example in French or English, we can locate the letter “e” (most frequent one) and deduce the key from this letter.

If for example in the first group, the letter “s” is the most repeated one, we know that “e” is replaced by “s” and hence the key is “s”“e” = “m”.

The following piece of code, analyses the frequencies and tries to find the key (supposing that the text is long enough and that the letters are distributed uniformly)


4. Cracking Vigenere :


The key, is the concatenation of all the Caesar keys we found in each group from the above step. Now that we have the key, reversing Vigenere is a game, here is a sample code that does this:


You can of course try different key lengths with this program and see if the text has any meaning without going through stage one to find the key length. (or you can calculate it directly using Friedman test)

5. Source Code :

https://gist.github.com/944422

I really wanted the last message of the Win a free domain name competition to be a Vigenere cipher since it is so much fun, but unfortunately, it won’t be. Have fun ;)

blog comments powered by Disqus