Welcome back (or just "welcome" to the people that didn't read the last entry, if you are one of those, I highly encourage you to atleast skim it since I use a lot of that terminology here).
So, last time, we discussed the Caesar Cipher and why it is so easy to be cracked. So, if it is easy exploitable, what's the alternative, well, the next step up would be a polyalphabetic cipher.
One such famous cipher is called the Vigenere Cipher. To use it, both parties must agree on a passphrase, such as "EYE". Let the converted-to-numbers passphrase be represented by the set $M$. To get the ciphertext (say the plaintext is "ATTACK") , we repeated the set $M$ inside set $N$ until the length of $N$ is the same as the length of the plaintext set $P$, and then:
$P_i + N_i \equiv S_i (\mod{26})$
Lets do an example. With the plaintext "ATTACK", and the keyphrase "EYE", we write down the following:
$P = \{1, 20, 20,1, 3, 11\}$
$N = \{3,25, 3, 3,25, 3\}$
Now, to get the ciphertext in integers ,$S$, we just add them up, modulo 26:
$S = \{4,19, 23,4, 2, 14\}$
Which translates to: "DSWDBN"
Now, why is this better? In the plaintext, we have two T's, but, in the ciphertext, there are no letters that are repeated twice. This means that this cipher does not preserve frequency. Why is that important? That means that one cannot use frequency histograms anymore, because the ciphertext no longer has the same frequencies of letters as the plaintext!
So, that means that the Vigenere Cipher is more resilient to statistical methods of attack than the Caesar Cipher, but, what about the dictionary/brute-force attack we covered last time? Well, lets see.
For the Caesar Cipher, we only have to cover 26 shifts, whereas for the Vignere cipher, things are much more complicated. We must consider every possible passphrase, so, if we are given the ciphertext,
and it is 12 characters long, for example, we must test for all passphrases that are 1 character long, 2 characters long and so on until 12 characters long. So, if we have a ciphertext and it is length is $n$ characters, we see that the number of tries we must do is:
$26^1 + 26^2 + 26^3 + ... + 26^n = \sum_{i=1}^{n}26^i$
Since that's a geometric series,
$\frac{26(1-26^n)}{(1-26)}$
That's a lot, even for a computer. If implement a cracking algorithm, it'll run in exponential time, which is really bad.
But, what happens if you can guess how large the length of the passphrase is (just imagine, for a moment)? That becomes only $26^n$ tries, and it would turn into a sort of modified Caesar Cipher. That's not too bad, right?
Well, but, how would we guess the length of the passphrase? A clever chap named Friedrich Kasiski figured this one out for us. Its not always effective, but, when it is, it reduces computation time greatly. Basically, repeated words in the plaintext may by chance have the same letters in the passphrase they were encrypted by. Where this is true, frequency is preserved, and we can see how long the passphrase is. Of course, this would only work if there were any repeated words at all, and also if the repeated words coincided exactly with the passphrase used. So, it obviously isn't precise, but, if it does work, it works brilliantly.
In the next post, we'll take a big jump, and take a look at the Knapsack Cryptosystem.
As always, feedback is greatly welcomed, whether negative or positive doesn't matter.
Also, if you guys have any other ideas for articles you would like me to do, please comment below.
I haven't updated the github repo (https://github.com/Poincare/cryptoBlog) as of now, so, if someone could do that, that would be awesome (I'll probably do it before the end of today, my git needs reinstalling).