Stonehill College

Data Structures and Discrete Mathematics Learning Community

Professors Ralph Bravaco and Shai Simonson

Cryptography Laboratory

Introduction

In this lab, you will experiment with cryptographic techniques that are the foundation for today’s e-commerce and internet communication.

In World War II the war behind the war was all about the British trying to intercept and decode German transmissions. (Check here for the contributions of the famous pioneer computer scientist, Alan Turing in this effort.)

A simple version of the codes used in World War II works like this.  The alphabet is shifted over by some number of letters, so that “a” might become “f”, and “b” would be “g”, etc., wrapping all the way around.  We could call that the f-shift.  The substitutions for the f-shift are shown below:

abcdefghijklmnopqrstuvwxyz
fghijklmnopqrstuvwxyzabcde

For example, the word “shai” is encoded into “xmfn” using the f-shift.  Note that the f-shift is nothing more than adding 5 to each letter’s ascii value.  You need to be careful to wrap around appropriately modulo 26.

If that was all there was to encoding, then a code breaker would only have to try 25 possibilities to break the code.  A codebreaker simply tries translating the message using each of the letter shifts until one translation looks like a real message.

The Germans were more clever in creating their encodings.  The trick they used (more or less) was to have each subsequent letter use a different shift.  After a while these shifts would cycle.  The sequence of shifts was represented by a code word consisting of letter shifts.  For example, if the codeword was “remarkable”, then the first letter of the message would shift by 17, the second letter by 4, …, the 7th letter by 0 etc.  The 11th letter would shift by 17 again like the first and the cycle would start to repeat.

For example, the first few characters of the sentence “We meet in New York for a rendezvous” would be encoded as “Ni ye…”.  Note that spaces between words were not actually transmitted or encoded.

Checkpoint:  See if you can encode the rest of the sentence yourself by hand.

For a codeword like “remarkable” of length ten, there are now 2610 different possible encodings.  The longer the sequence of the cycle, the more possibilities there are.

Breaking the Code

Once one knows the length of the cycle, there are a number of techniques for breaking the code without having to try all the possibilities. One idea is to divide the letters of the encoded message up into 10 groups, one for each shift in the cycle.  The 1st, 11th, 21st etc make the first group.  The 2nd, 22nd, 32nd etc. make the second group.  And so on.  The letters in each group are encoded using the same shift.

We try to decode the letters a group at a time.  For each group, we try all 26 possible shifts, but since the letters in each group are scattered throughout the message, we are unlikely to learn anything by doing this.  The chance of something looking familiar will be very small.

There is a way to learn something about the message nevertheless.  Every language has a characteristic statistical frequency for each letter.  For example, in English the letter “e” is the most common.  For a given group, we compare the frequencies of the letters in each of the 26 decodings to the expected frequencies.  Matching the two sets of frequencies helps identify the shift or at least narrows down the number of possibilities from 26 to just a few.

The Germans actually used a more complex version of cycles of shifts that required an even more sophisticated computer-aided decoding effort.  The end of the story was that the British were successful in their efforts and the Allies prevailed.

Program 1.  A cycle of shifts is described by a codeword like REMARKABLE, where each letter represents a shift, A is 0, B is 1, … and Z is 25.  Write a program that takes a text and a codeword, and produces the encrypted text.  You will need to use the % (mod) operator to make sure you wrap around appropriately.   You might want to keep the codeword in circular linked list.
Program 2.  Modify your program so that it can decode as easily as it encodes. All you need to do is add an option to decode, and then do subtraction rather than addition.

Make a note of how symmetric this is!  This was the hallmark of cryptography since ancient times, until the last twenty years.  If you knew how to encode something, then you knew how to decode it.  You might be thinking “well duh!!”

Determining the Cycle Length

An ingenious method to determine the cycle length was discovered by Philip Friedman whose paper is described by David Kahn in his seminal history of cryptography  as "the most important single publication in cryptology."   Friedman defines the index of coincidence, a statistical measure of text, that for normal English is about 6.6%, but for a random collection of letters is only about 3.8%.

To determine the index of coincidence of a particular piece of text, take the text, rotate it by some random number of places, and write the rotated text underneath the original text.  For example, the sentence below is rotated by 58 characters and the rotated version appears beneath it.

alanturingbreakscodeslikenobodysbusinessbuthispersonallifesadlybecameeverybodysbusiness
dysbusinessbuthispersonallifesadlybecameeverybodysbusinessalanturingbreakscodeslikenobo

The index of coincidence is the number of places in which the same letter occurs in both strings of text.  In this example, there are 6 coincidences for 87 characters of text, a rate similar to the 6.6% of normal English text.  An important point to notice is that when a text is encrypted by shifting every letter by the same value, the index of coincidence remains the same.

Given an encryption of the same text with a  codeword we would expect to see an index of coincidence more like the 3.8% of random letters, unless we happen to have rotated by a multiple of the cycle length.  In the case where the rotation is a multiple of the cycle length, the letters that are lined up underneath each other are encoded using the same shift, and the usual English index of coincidence would be expected, because the index of coincidence is invariant under a single letter cipher.  Hence the way to determine the cycle length is to rotate the encrypted text by 1, 2, 3 etc. symbols, until we see an index of coincidence that looks more like 6.6% than 3.8%.

Program 3.  Write a program that calculates the cycle length of an encoding by searching for an index of coincidence close to 6.6%.  You can use your earlier programs to create test data.

Public Key Cryptography - A Revolution in Cryptography

Cryptographic methds do not have to be symmetrical!  Today a new kind of encoding is used which is called Public Key, one-way, or sometimes trapdoor cryptography.  In this new method, the whole world is able to encode information, but unless someone has more information, then they still cannot decode a message.

This new method is what allows e-commerce to flourish without fear of a security breach.  Here’s how this trapdoor technology is used.   Say I want to send my credit card to Amazon.com to buy a book, I encode my number with a publicly published method that anyone else could use, but only Amazon.com can decode it!

What if I don’t trust that I am actually talking to Amazon.com?  That is, I suspect that a thief posing as Amazon, sent me a fake encoding algorithm, and then they will decode it and get my number!  In that case, we do the process in reverse.  I ask Amazon to send me a message encoded with their secret encoding that anyone can decode with their public algorithm.  If I decode the message and it states, “Hi I am Amazon.com”, I know it had to come from them, because nobody else would have known how to encode it correctly.  This is called authentication and is described in nice detail by VeriSign, the largest digital signature provider, at http://www.verisign.com/docs/pk_intro.html.

The Mathematics Behind Public Key Cryptography

Ironically, this new extremely practical technology is based on one of the oldest branches of pure mathematics, famous for its beauty and scarcity of practical applications:  number theory.

The details of public-key cryptography are based on the RSA algorithm, by Rivest, Shamir, and Adelman.  We will review the mathematics here briefly here and end with a few experiments and programs.

Euclid’s Algorithm and Greatest Common Divisors

The greatest common divisor of two integers is the largest integer that divides both evenly.  For example, the greatest common divisor (gcd) of 24 and 15 is 3. One way to calculate the gcd of two numbers, a and b, is simply try all numbers starting from the smaller of the two down to one.  The first of these numbers that divides both a and b evenly, is the greatest common divisor.
Program 4.  Write a program to find the greatest common divisor this way.

Program 5.  Write a program to find the greatest common divisor by listing the prime factors for each number, and taking the intersection of the two lists.  A nice way to do this is to store the prime factors sorted in two arrays, and then use an algorithm that resembles merging.  For example, if the two numbers are 24 and 40, then the prime factors of 24 are: 2, 2, 2, 3, and the prime factors of 40 are:  2, 2, 2, 5.  The intersection of these two lists is 2, 2, 2, so the gcd is 2x2x2 = 8.

Although the methods used in both program 4 and 5 are intuitive and seem natural, neither is the best way to find greatest common divisors.  It will be important to us later to be able to find the greatest common divisor efficiently without factoring the given numbers.

Euclid (300 B.C.E.) invented a method to compute greatest common divisors that bears his name to this day.  In Euclid’s algorithm, we use recursion.  He proved that assuming a>b, then the gcd(a,b) = gcd(b, a mod b).  An example is shown below.

a
b
   
123
28
28
11
11
6
6
5
5
1
1
0
When the smaller number, b, equals 0, then the answer is a.  In this case, the gcd(123, 28) = 1.
Program 6.  Write a recursive program to implement Euclid’s algorithm.  Write an iterative program to implement Euclid’s algorithm.

Experiment.  Test your three programs on a variety of pairs of numbers, time them, and discuss the results.  Note that theoretically Euclid’s algorithm takes time proportional to the number of digits in the numbers, while the other two algorithms take time proportional to the numbers themselves (an exponential explosion in complexity).

Connection to Cryptography

It is not obvious that Euclid’s algorithm has a lot to do with RSA public-key cryptography, but it does.  You will need to be patient.

A theorem of Euclid that we will not prove here, states that if the gcd(a,b) = m, then there are two integers  x and y such that ax + by = m.  These values x and y can be computed constructively using Euclid’s algorithm.  Here is an example using the numbers from the previous example.

123  =  28(4) + 11
28  =  11(2) + 6
11  =  6(1) + 5
6  =   5(1) + 1
The greatest common divisor was 1.  Now we work our way backwards:
1  = 6 – 5(1)
1  = 6 – (11 – 6(1))(1)  = 6(2) – 11(1)
1  =  (28–11(2))(2) – 11(1)   = 28(2) – 11(5)
1 = 28(2) – (123 – 28(4))(5) = 28(22) – 123(5)
Each step gives us the x and y for the pair of numbers on that level, until finally we arrive back at the original pair of numbers 28 and 123, where the x and y are 22 and –5 respectively.  Study this example, and try to understand exactly what is going on.  (You may notice that the solutions for x and y are not unique.  In class, we will discuss more details and variations of this method.  But for the purposes of this lab, you won’t need to know anything more about it.)
Checkpoint:  To make sure you get the idea above, compute by hand, the integers x and y, such that ax + by = gcd(a,b) for each pair a, b below.
(99, 101)    (10, 35)      (7, 12)       (36, 42)

Although the next program is only three lines long, you will likely need a hint trying to figure out how to relate the computation of the next row’s x,y values to the previous row’s. 

Program 7.   Write a recursive program to calculate the integers x and y, such that ax + by = gcd(a,b) for a given pair a and b.  This program will be crucial for the RSA cryptography algorithm.  Try your program out on the numbers 233987973 and 41111687.

Hints for program 7:   If gcd(a,b) = a mod b, we are at the last row, and we can return (x,y) = (1, -(a/b) ).  Otherwise, we must recursively compute the function with the inputs (b, a mod b).  Call the outputs of this recursive call x' and y'.  Then the original function can return (x,y) = (y', x' - (a/b)y').  Try this on the example above for 28 and 123, and I will review the example in class if necessary.  Note also that an easier base case that goes one level deeper is "if b = 0, then the gcd is a, and you can return (x,y) = (1, 0).

 Public Key Cryptography – A First Attempt

This section describes a method that is a simple version of RSA, but has the drawback that the private key can be computed from the public key using program 6 that you wrote earlier.  Study this simple version first, because although it does not quite work because of the drawback mentioned, it will give you a chance to understand what is really going on before tackling RSA proper.

All the public key cryptographic methods encode one integer into another, so we assume that our messages are first converted to a sequence of integers.  The exact method of conversion is not trivial but it won’t concern us here.

To encode a number, we will need the public key.  This consists of two integers, for example 5 and 17.  The second integer must be prime, and the first must be relatively prime to the second integer minus one.  In this case, 17 is prime, and 5 is relatively prime to 16.  Recall that two numbers are relatively prime if and only if their greatest common divisor is equal to one.

Encoding and Decoding Numbers – Public and Private Keys

Now to encode a number, say 6, using this key.  All we do is calculate the remainder of 65 after dividing by 17.  You can check that this equals 7.  To decode 7 back into 6, we need to have the private key.  The private key also consist of two numbers, one of which is really public, namely the prime 17, and one of which is really private namely 13.  Then we calculate the remainder of 713 after dividing by 17, and we get 6.

 Where does the 13 come from?  Why does this work?

Thirteen is the solution to the equation 5x = 1 mod 16.

What happened here is that we computed 65(13) mod 17 = 713 mod 17 =  6 mod 17.  Since 5(13) = 1 mod 16, we can write 5(13) = 16(4) + 1.  This means that that 616(4) + 1 = 6 mod 17, and equivalently 616(4) = 1 mod 17.  It turns out that this must happen as long as 5(13) = 1 mod 16, and 5 has no common factors with 16.  There is a theorem to justify this.

Fermat’s Little Theorem

Pierre de Fermat, was an amateur but great mathematician of the early 17th century.  Fermat’s Little Theorem, not to be confused with his more famous Last Theorem states:

If p is a prime, and is not divisible by p, then p evenly divides ap-1 - 1.  For example, if p equals 17, and a equals 6, then 17 evenly divides 616 – 1, or 616 = 1 mod 17.  The proof is left for the discrete math class.   It is elegant but irrelevant for this lab.  It is the result that we need.  If 616 = 1 mod 17, then certainly 616(4) = 1 mod 17, and our encoding and decoding will work.

Computing Inverses

It doesn’t matter if you understand all the mathematics here, but you do need to understand how to calculate this number 13.  Recall that thirteen was the solution to the equation, 5x = 1 mod 16.  That means we must find two numbers x and y such that 5x – 16y = 1.  Sound familiar?  It is just Euclid’s algorithm!

The numbers 5 and 13 are called inverses of each other mod 16.  That means they multiply together to equal one.  The computation of an inverse is just what you did in program 6.

Now What?  If a person wanted to calculate the private key from the public key, they could do it by just using your program 6 and calculating the inverse of the private key mod 16.  What’s wrong is that it is possible to calculate inverses quickly and efficiently. You wrote a program to do it yourself!  So if someone publishes a public key (5, 17) you could have your program compute the private key (13, 17).  That is not good enough.  We want the decoding to be harder than the encoding.  We do not want anyone to be able to find 13 so easily.  Only the person who holds this private key can decode.

 A Second Attempt at Public Key Cryptography

The first attempt described above is based on work by Diffie and Hellman in 1976 (see http://www.verisign.com/docs/pk_intro.html) and was known before the contributions of Rivest, Shamir, and Ademan (RSA) in 1977. The RSA contribution is described next.
Now we start with two prime numbers, p and q, say 2 and 17, and compute their product pq = 34.  We calculate (p-1)(q-1) = 16, and we choose a value that has no common factors with 16, say 5.  The public key becomes the pair of numbers 34 and 5.

Encoding and Decoding

The encoding and decoding is done just like before.  For example, to encode 6, we compute 65 mod 34 = 24 and to decode 24 we compute 2413 mod 34 = 6.  As before, thirteen is the solution to the equation 5x = 1 mod 16.

The difference between this and our first attempt is that previously, 16 was simply one less than the public prime, so that the equation, 5x = 1 mod 16, was easily constructed (and solved by program 6).  But now the only way to calculate 16, is to factor the number 34 into 2 and 17 and compute (2-1)(17-1).  The factoring part is hard!  Nobody knows how to factor numbers quickly.  The best method is proportional to the given numbers.  Contrast this with the calculation of the inverse that is exponentially faster, proportional only to the number of digits in the given numbers.

Of course, anybody can factor 34, but in practice the two primes that are chosen are on the order of 100 digits each.  This makes all currently known factoring algorithms take centuries.  If you come up with an efficient algorithm (like the one for inverses) that can factor numbers, you will be famous!

Checkpoint
a.  Create your own public and private codes for doing RSA encrypting.  Make sure they satisfy all the appropriate constraints.
b. Assuming that each character in a message is represented by its ASCII value, encode the message “Too much work!”.

Program 8.  Write a program to encode numbers using RSA given the public key.  Make sure to use the fast modular exponentiation algorithm we discussed in class.  That will allow you to deal with very large numbers efficiently.  Raising x to the yth power can be done in time proportional to log y.  

Program 9.  Write a program that factors the public prime, into p and q, computes (p-1)(q-1), computes the inverse to find the private key, and decodes a message.  Use your program to solve the following problem.

Puzzle.  Cracking the UFO Message.
A public code is found etched on a rock on Mars:  (7, 1147).  The message {128, 1040, 129, 1144, 788, 735, 570, 875} is received from outer space on one of the billion machines running the Extraterrestrial Life Detector Screen Saver distributed among the world’s PC’s.  Assuming that this message was encrypted with the public code found on Mars, crack the code and decode the numbers.  (Note that this problem uses small primes, hence giving your program a fighting chance to succeed before we are all dead.)

Note that to use RSA on integers that are large which is the whole idea, you will need to use a bigint class, in order to avoid overflow.  Here is a stack class to include in bigint stack and here is another.  You can also just use longint for the examples below, but to proceed further you woul dneed a bigint class.

Cryptographic Decoding Challenges for Practice and Review

Challenge 1.

A code word less than six letters long gives an encoding of a well known cerebral song:
BQHIERPVBZXOPORHASACNFLQHBYSKFBBZKBHAHASYZHKXFLQHBLIEHBBZKBHAHASKOBB
PWMVMVXHACNUAHLWWPXHAWGYBBBQHIERUSTBHHASKZBBVCEBBTBCGZRVTRTPKOBB

What is the original text?

Challenge 2.

A code word less than six letters long gives an encoding of a Woody Allen math joke:
BOQWMNWOOQSSFHQKASRLAGOWRAADWRKAONCIOHKJKCYHVYWHLLC

What is the text?

Challenge 3.

RSA encoding of nine ascii values, using public key (10555, 21971), gives the name of a hunter:
16912    19531    20676    16912    6613    2348    17835    15770    15770

What is the original text?

Challenge 4.

RSA encoding using public key (5555551, 118513313) gives:
80217189    107242213    96490860    79543571    25953566

What are the original numbers?

Challenge 5.

Factor any of the RSA challenge numbers at RSA-Security
Read about a group of people called Bovine RC5 who successfully crack RSA keys.


Enrichment

We will screen the movie Breaking the Code showing the rold of Alan Turing in the British decryption of German codes in World War II.

We will run some friendly competitions to encrypt and decrypt various kinds of codes.  Prizes for the winners who crack the most challenges above and in competition.