www2.rivier.edu Open in urlscan Pro
66.251.112.110  Public Scan

URL: https://www2.rivier.edu/faculty/vriabov/cs572aweb/Assignments/CrackingClassicCiphers.htm
Submission: On October 01 via manual from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

[ Home ] [ Up ] [ Cracking a Simple Cipher ] [ Cracking Classic Ciphers ]
[ Probability ] [ Team Research Project ]



 

Now that we've cracked a couple of simple, but short, ciphers, let's explore how
cryptographers might actually crack some classic ciphers.

Note: Remember that this web site contains a number of potentially useful Java
applets, which you may choose to use to help you with the work in this
assignment.


SHIFT SUBSTITUTION CIPHERS

A MonoAlphabetic Substitution Cipher maps individual plaintext letters to
individual ciphertext letters, on a 1-to-1 unique basis.  That is, every
instance of a given letter always maps to the same ciphertext letter.  

The oldest such cipher known is the Caesar cipher, where the mapping involved a
simple shift within the alphabet.  For example, the following represents a
Caesar cipher with a shift of 3:

Plaintext:  ABCDEFGHIJKLMNOPQRSTUVWXYZ
Ciphertext: XYZABCDEFGHIJKLMNOPQRSTUVW

(Notice that we are doing a circular shift, by wrapping the end of the alphabet
around to the beginning.)

To encipher a message, we simply take each letter in the plaintext, find that
letter in the Plaintext row, and substitute the corresponding letter immediately
below it, in the Ciphertext row.  For example, using this substitution table, we
can take the message:

> Once more unto the breach, dear friends

and encipher into the following:

> Lkzb jlob rkql qeb yobxze, abxo cofbkap

Of course, to decipher the text, we simply reverse the process -- or
equivalently, use the negative of the original shift value.

Question 1: Here's the ciphertext for a message enciphered in the same way as
above:

> Qeb bkbjv mixkp ql xqqxzh lk Qrbpaxv jlokfkd

What is the plaintext for this message? (This should be really easy! -- you can
solve it manually, or use one of the Java Tools.)

In reality, because case, word spacing and punctuation in the ciphertext give
additional clues about the plaintext, they are usually removed, and the
ciphertext is often organized into groups of characters.  For example, the
ciphertext in the example above would then look like the following:

> LKZB JLOB RKQL QEBY OBXZ EABX OCOF BKAP

Of course, then it is the responsibility of the person doing the decoding to
reconstruct the original word spacing, etc.  Removing such clues from the
plaintext before enciphering it makes it quite a lot harder to crack the cipher.


FREQUENCY ANALYSIS ON SUBSTITUTION CIPHERS

When attempting to decipher a shift substitution ciphertext, if you don't
already know the number of characters to shift, of course, you need to figure it
out.  There are a couple of ways you might be able to do this:

 * Use brute force.
   
   In this case, we try every possibility, until we find a reasonable looking
   plaintext.
   
   Question 2: Given the approach described above, for a Shift Substitution
   Cipher, how many possibilities are there for a shift value? Is this a
   feasible task?
   
   You can try my Java applet that implements this, if you'd like.

 * Analyze the letter frequency of the ciphertext, and try to deduce the shift
   value.


MONOALPHABETIC SUBSTITUTION CIPHERS

MonoAlphabetic Substitution Ciphers employ a more complex approach:  Instead of
using a simple shift to determine the letter mapping, they select an individual
mapping for each character, where the relative position of the corresponding
characters is, in general, different for all characters.

So, in order to decipher a MonoAlphabetic Substitution cipher, you need to
determine the mapping for every character -- a lot more complex than just
determining a single fixed shift value.

Here's an example of a MonoAlphabetic Substitution Cipher.  Here's a letter
substitution table for such a cipher:

> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> OEXPVYQNAZGCDKURHJLBIFMSWT

where the top row represents the plaintext letters, and the bottom row
represents the corresponding ciphertext letters.

Using this table, we can encipher some plaintext into its corresponding
ciphertext:

> A bowl of Moose Tracks ice cream.
> O eumc uy Duulv Bjoxgl axv xjvod.

The top line is the plaintext, and the bottom line is the ciphertext.  You can
figure out how the translation is done by working through each letter of the
plaintext, and matching it with the corresponding letter in the substitution
table.

So, given some ciphertext, how would you determine the letter substitution
table?

Again, you really have the same two choices:

 * Use brute force.
   
   In this case, we try every possibility, until we find a reasonable looking
   plaintext.
   
   Question 3: Given the approach described above, for a MonoAlphabetic
   Substitution cipher, how many possibilities are there for character mappings?
   Is this a feasible task?

 * Analyze the letter frequency of the ciphertext, and try to deduce the
   character mapping.


HOW DO WE PERFORM LETTER FREQUENCY ANALYSIS?

The brute force approach is pretty self-explanatory, so let's examine the Letter
Frequency Analysis approach in more detail.

First, we need to recognize that we're making some assumptions about the
plaintext:

 * That it consists of characters, not some kind of binary code.
 * That it is written in some known natural language (in our case, English)
 * That we know the frequency of letters in a typical piece of text in that
   language.
 * That the plaintext is typical of English text, and so we expect the same
   frequencies of letters (approximately, within statistical fluctuations).

As long as we know that there is a 1-to-1, unique, mapping from plaintext to
ciphertext (and therefore also from ciphertext to plaintext), we can employ our
knowledge of those letter frequencies to help us crack a substitution cipher. 
Note that we need a large enough piece of text to give us some expectation that
we have a large enough statistical sample.  The longer the message, the better
statistical sample we are likely to have.

SOURCES OF LETTER FREQUENCY INFORMATION

To find known letter frequencies in typical English text, you can go to the
following web sites, among many:

 * http://www.simonsingh.net/The_Black_Chamber/frequencyanalysis.html
 * http://rinkworks.com/words/letterfreq.shtml
 * http://raphael.math.uic.edu/~jeremy/crypt/freq.html
 * http://www.central.edu/homepages/LintonT/classes/spring01/cryptography/letterfreq.html
 * http://deafandblind.com/word_frequency.htm

Here's a typical representation of the letter frequencies in typical English,
using a histogram/bar chart:



The left hand side is in order by the letter position within the alphabet, while
the right hand side is in decreasing order by frequency.

Note that the twelve most common letters in English are famously:

> ETAOIN SHRDLU 

(see http://www.worldwidewords.org/weirdwords/ww-eta1.htm)

However, in the above histogram, the most common letters are:

> ETAOIN SHRDLC

Notice that the last letter is C, not U.   

This is a useful lesson in itself.  Notice that the relative frequencies of U
and C are 2.75% and 2.78%.  That is, the frequencies of both are already quite
low -- certainly when compared with E at 12.72% -- and also quite close. 
Different sets of English texts will produce slightly different frequencies, and
the numbers are also subject to statistical fluctuations.

COUNTING LETTERS

So, all we basically do, given a piece of ciphertext, is to count the number of
occurrences of each letter, and from that build up a table that shows the
relative frequency of letters in that ciphertext.  Then we attempt to match it
with the known English letter frequencies, and try to figure out corresponding
letters -- and thus the substitution table.

So let's try it with the following ciphertext:

> Rxmm ksi uyklxtkz rxd ksi Zeuilatrm yvyxd, ksxz niyl?

Well, we could actually count the number of each of the letters in this by
hand.  However, let's do it by computer.  Here's what my program gives us:



So, let's make the simple assumption that the letters really do match up, based
on the English letter frequencies, and that would give us the following
substitution table:

> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> UGYDMPBVAKEINWJXQSHLCFZTOR

(In this case, the top line represents the input ciphertext frequencies.)

Then, if we use this table to try to come up with the corresponding plaintext,
we get:

> Stnn eha coeitler std eha Rmcaiulsn ofotd, ehtr waoi?

Clearly, not any plaintext that makes sense to me, anyway.

So what's the problem?  It could be that the plaintext doesn't follow the
assumptions we made about letter frequency analysis.   But it could also be that
there isn't enough content in the ciphertext to give us reliable statistical
information.

Let's look at what the original plaintext really was:

> Will the patriots win the Superbowl again, this year?

and the substitution table was:

> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> B??NU???E?TRLY???WHOPG?IAS

where the '?' letters represent "Doesn't matter', because the plaintext doesn't
contain any of the corresponding letters.

So, you'll most likely find that, with short ciphertexts, frequency analysis may
not help you much.

Consequently, the ciphertexts I'm asking you to decipher in this assignment are
much longer than we've seen before, simply to give you the additional
statistical significance.


THE ASSIGNMENT

So, here's what you need to do for this assignment:

 1.  First, I recommend that you download the code for the Java Tools, so you
     can run them locally on your own machine.
 2.  Bring up the Java Tools:
     
     
     
     and select "MonoAlphabetic Substitution Cipher".
     
     This will bring up a window which provides you with a lot of tools for
     cracking a monoalphabetic substitution cipher.

 3.  Play with this tool for a while, familiarizing yourself with its
     capabilities.
     
     * Try cutting and pasting some (fairly long) samples of English text into
       the Input Text area, and seeing how closely the letter frequencies match
       (or don't match) the typical English frequencies.
     * Try using the Autogenerate feature to see what happens to your text when
       you translate (encipher) it.  
       
       Question 4: What would you expect to happen in this case?
     * Try really encrypting some text, and then see how successful you are at
       decrypting it, using the tool and its features (as opposed to you doing
       it -- you already know what the plaintext is!)
       
       Note: The Input Text area is editable;  that is, you can enter text into
       it, edit that text, paste into it, etc.  On the other hand, the Output
       Text area is not editable;  you cannot enter text into it directly. 
       However, you can select text from the Output Text area and copy from that
       selection.  You can use the standard Ctrl/A and Ctrl/C key combinations
       to select the entire contents of the area, and copy those contents into
       the paste buffer, respectively.
 4.  Next, download the two ciphertexts:
     * Ciphertext 1 -- In this ciphertext, I have retained the original word
       spacing, punctuation, etc.
     * Ciphertext 2 -- In this ciphertext, I have removed word spacing,
       punctuation, etc., and organized the text in groups of 4 letters.  This
       will make it more difficult to decipher using the context that those
       clues provide.
     
     Note: In Internet Explorer, you should right-click on these links, and
     choose "Save Target As...".  This way, you prevent Internet Explorer from
     trying to put HTML tags around the files when you save them.

 5.  Deal with one ciphertext at a time (to keep yourself sane!)
 6.  Bring each ciphertext up in an editor (a plain text editor, not something
     like Microsoft Word)
 7.  Copy all of the ciphertext from your editor, and paste it into the Input
     Text area.
 8.  You can now use the Java tool to determine letter frequencies, and come up
     with a first approximation of the substitution table, based on those
     frequencies. 
 9.  Translate the text, based on the auto-generated table, and see if you can
     see anything that resembles recognizable plaintext.
 10. Using the tool, change individual letter substitutions until you believe
     you have the complete plaintext. 
     
     Hint: You will probably find that you will have to look at the generated
     "plaintext" and make educated guesses at what the words might be, using
     clues like partially-formed words, punctuation, and a knowledge of the
     common two-letter and three-letter words that exist in the English
     language.  If you get stuck, talk to someone else -- your wife, husband,
     girlfriend, boyfriend, parents, friends, etc. It is often the case that
     when you get stuck, just interacting with someone can lead you to overcome
     the problem, even if the other person doesn't actually come up with the
     solution.
     
     Note: For the second ciphertext, don't bother to go through the entire
     document trying to reinstate the word spacing, sentences, headings, etc. 
     I'm not expecting you to do that kind or amount of work!
     Instead, just tell me where the text comes from, who the author is, etc. 

 11. Copy the plaintext from your Output Text area (Ctrl/A, followed by Ctrl/C),
     and paste it back into a new document in your editor. 
 12. Print the result from your editor, or copy and paste it into your
     assignment report.


WHAT TO SUBMIT

First, refresh your knowledge of what I expect in an assignment report (and the
format of that report).

I want to see the following:

 1. Answers to all the questions I pose, above.
 2. A description of what you did.
 3. The results of your deciphering.

While it would be nice to see you come up with a correctly deciphered result in
each case, I am much more interested in a detailed description of what you did,
what you tried, what worked, and (especially) what didn't work, what your
thought processes were, your frustrations, any suggestions you might have for
improvements to the tool, and so on.  How hard was it to solve these ciphers? 
How much work did you have to contribute, compared to what the tool contributed?

Be sure to give credit where it was due, to all resources (human or otherwise)
you used to solve these problems.

 

This page was last changed on February 17, 2008