Kejeokeoqjn jeejqxk vl qazgevhajgcz

Supervisor: Dr L.J.M. Aslett

The name of this project is encrypted! Encryption methods have been in use for approaching 4000 years, as a means to solve the age old problem of conveying secret information safely even if the message is intercepted. Of course, the invention of cryptography inevitably gives birth to its antipode, cryptanalysis: the study of improving or defeating these systems ... we will be interested in the latter.

A (deterministic symmetric) encryption algorithm is defined by two functions, \( \text{Enc} : \mathcal{K} \times \mathcal{M} \to \mathcal{C} \) and \( \text{Dec} : \mathcal{K} \times \mathcal{M} \to \mathcal{C} \), where \( \mathcal{K} \) is the space of so-called "keys", \( \mathcal{M} \) is the space of messages and \( \mathcal{C} \) is the space ciphertexts. If I select a key \( k \in \mathcal{K} \) and some information I want to encrypt which lies in the message space \( m \in \mathcal{M} \), then crucially these functions must satisfy:

$$ \left. \begin{align*} c &= \text{Enc}(k, m) \\ \iff m &= \text{Dec}(k, c) \end{align*} \right\} \ \forall\ k \in \mathcal{K}, \forall\ m \in \mathcal{M}, $$

Furthermore, if the encryption algorithm is to be useful, we want recovery of \( m \) from \( c \) to be difficult without knowledge of \( k \) (even if the functions \( \text{Enc} \) and \( \text{Dec} \) themselves are known to the attacker).

One of the simplest encryption algorithms is the substitution cipher: here, the letters of the alphabet are permuted to form a key and then each letter in the message individually transformed according to the permutation. You can try it out using the live embedded code below — try changing the text to be encrypted, or generate a new key by clicking the "shuffle key" button:

\( m \) =
m_i:   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
m_i:   
⬇️   
KEY:   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
   
= c:   
Dec:   


Typically \( |\mathcal{K}| < \infty \), so a naive approach is to attempt a brute-force attack, whereby we systematically perform decryption of \( c \) with every possible key \( k \) until a plausible message \( m \) is revealed. However, it is not hard to see that in this case \( |\mathcal{K}| = 26! = 403,291,461,126,605,635,584,000,000 \). In other words, even if you can check a billion keys per second on a fast computer, it would take 12.79 billion years to check every key!

However, there are cleverer methods of attack, which might include a frequency analysis of letter occurance (Smart, 2004, Chapter 3.3). While this reduces the search space a bit, it is still a challenging problem to fully automatically crack the key for all letters.

Statistical attacks on substitution cipher schemes has an interesting origin: a psychologist from the state prison system in California approached the Statistics Department at Stanford University with a collection of coded messages that had been passed between prisoners (Diaconis, 2009). It was fairly clearly a substitution cipher, but the prison system couldn't crack the key and sought academic assistance. The Stanford statisticians devised an ingenious way to perform a guided random walk around the space of keys, \( \mathcal{K} \), using Markov Chain Monte Carlo methodology. They successfully cracked the code in a fraction of the time a standard cryptographic attack would take.

So what about the project title, "Kejeokeoqjn jeejqxk vl qazgevhajgcz"? Well, I have embedded in this webpage an implementation of the Stanford statisticians' MCMC method ... click the "Crack it!" button below to have your browser attempt to attack the encrypted course title and a little extra message from me. If your computer gets too hot you can pause the run with the stop button. Additionally, the method is not perfect (we can explore ways to improve it in the project) ... if you get to 10,000 iterations and still don't have a rough solution then you can click the reset button to start the algorithm from a new random location and then click "Crack it!" to try again.

c_i = KEJEOKEOQJN JEEJQXK VL QAZGEVHAJGCZ
      MV LVE ESNN JLZVLS SNKS CVP QVVN
      ECOK GAVUSQE OK
 
      
 
iteration :  
key guess :  
Message guess:

So in this project, we will explore some statistical attacks on encryption schemes, learning methodology such as Markov Chain Monte Carlo and simulated annealing along the way. The ultimate goal will be for you to implement an algorithm to perform fast attacks and successfully decrypt cipher texts from these schemes without access to the keys.

Modules

An essential prior module is Statistical Inference II, and the ability to progam in either R or Python.

It would be an advantage to have Data Science and Statistical Computing II, but the necessary background can easily be incorporated into your project study so it is not an essential prior module.

It may be enjoyable to take Cryptography & Codes III at the same time, but since we're looking at quite straightforward encryption schemes it is not an essential companion module.

References

Smart, N.P., 2004. Cryptography: An Introduction. 3rd Edition.

Diaconis, P., 2009. The Markov Chain Monte Carlo revolution. Bulletin of the American Mathematical Society, 46(2), pp.179-205. DOI: 10.1090/S0273-0979-08-01238-X