ONE-TIME PAD CRYPTOGRAPHY
Frank Rubin
January 13, 1997


Abstract:   This paper shows how a moderate amount of random key stream can be used to generate a lifetime supply of keys for one-time pads. It further shows how arbitrarily many parties can correspond using the same random key, without compromising one another's communications. The net effect is to make the unbreakable one-time pad practical for widespread use.

Keywords:   cryptography, cryptology, one-time pad, data security, autokey cipher, key enhancement.


INTRODUCTION

       There are two eternal verities in cryptography: the only absolutely secure cipher is the one-time pad, and the one-time pad cannot be achieved in practice. The goal of this paper is to show how the one-time pad can be accomplished.

       The term "one-time pad" refers to any method of encryption where each byte of the plaintext is enciphered using one byte of the key stream, and each key byte is used one time then never used again. The key stream for a one-time pad must be a true-random stream, meaning that every key byte can take any of the values 0 to 255 with equal likelihood, and independently of the values of all other key bytes. By contrast, in a pseudo-random key stream the value of each byte after the first several is mathematically derived from the values of a few preceding bytes.

       The practical difficulty of using a one-time pad is that the key bytes cannot be reused. This means that even for a two-way exchange of messages, each party must have a sufficient supply of key material on hand so that they cannot run out before more key stream can be furnished. For N-way exchanges, the amount of key required increases quadratically.

       Many papers on cryptography assume that the only possible way out is for the key to be generated by some mathematical algorithm, and then expend a great deal of cleverness in finding an algorithm that adequately approximates the properties of a true-random key. The paper [1] gives a broad survey of such methods. The weakness of such methods is that determining a portion of the key stream will allow an opponent to reconstruct the entire key stream using the same mathematical algorithm.

       This paper presents a solution intermediate between a pure random key and a pure mathematical generator.

ONE-TIME PAD

       One-time pad encipherment can be denoted as

      Ci = E (Pi, Ki)

where E is the enciphering operation, Pi is the i-th character of the plaintext, Ki is the i-th byte of the key used for this particular message, and Ci is the i-th character of the resulting ciphertext. Both the key stream K and the enciphering operation E are secret. The key for each individual message is the starting location in the entire random key stream used for this encipherment.

       For efficiency, it is good practice to start each message near the position following the key byte used for the last character of the previous message. This eliminates the need to keep track of which portions have been used, and removes the danger that a message will be longer than any of the remaining segments.

       Although the term byte has been used, and will continue to be used for each unit of the key stream, it is not necessary that each unit of the key be 8 bits long. In fact, a key with larger units gives protection against a known-plaintext attack, since then there may be several values of the key which produce the given ciphertext byte from the known plaintext byte. Similarly, the plaintext could be enciphered in units other than 8-bit bytes. For present purposes, however, encipherment based on bytes is sufficiently general.

       In principle, the encipherment E can be achieved by simple look-up in a 256·2k-byte table, where k is the size of each key unit. Each column of the table, corresponding to one value of Ki is a permutation of the values 0 to 255. This makes it possible for the receiver to decipher the message uniquely. The cipher will be most secure if each of these 2k permutations is statistically independent of the others and of the identity permutation, and if the enciphering operation is implemented so that the waveforms of the bits of Ci give no clue as to the values of Pi and Ki.

       The reason that no portion of the key stream may be reused is that an opponent can detect reuse by the simple Index of Coincidence statistical test. Once the opponent finds two or more portions of messages enciphered with the same part of the key, it becomes possible to decipher those portions.

KEY ENHANCEMENT

       Suppose that two parties wish to correspond in secret over potentially compromised channels. They could communicate using a one-time pad, as above, provided that they could obtain a sufficient supply of key stream bytes, or that they could resupply themselves with key bytes at a rate fast enough to prevent ever running out. Note, however, that the total amount of key used over time equals the total amount of message material. One byte of key for each byte of message. So, if they had a secure way of resupplying the key, perhaps they could send their messages this way, and skip encryption altogether.

       Suppose, then, that the parties do not have an endless supply of key bytes, but a limited supply, at least enough for the longest possible message, but perhaps only enough for one day or even one hour of communication. The question is: how can their supply be stretched to last a week, or a year, or until new key material can be distributed?

       There are many ways to generate a new sequence from a given sequence. Indeed, this is precisely the problem that cryptography addresses. Even such a simple expedient as adding a constant value to each key byte Ki (the Caesar cipher) could be effective if the enciphering function E is strong enough. It is safer, though, to choose a method of enhancing the key which does not depend on either the strength or the secrecy of E. One such method is the autokey.

AUTOKEY

       Autokey is a very simple and fast method of encipherment. It has the valuable advantage that each byte of the new key stream has no correlation or fixed relationship with the corresponding byte in the original key stream. Let A and B be two independent simple substitutions. The new key stream L is given by

    L1 = A(K1) + B(Kn),
    Li = A(Ki) + B(Li-1)     for i = 2, 3, ..., n,

where the addition is modulo 256, and n is the length of the old key stream.

       Greater strength can be obtained by repeating the autokey two or more times in order to "cover its tracks." This assures that in the final iteration none of the original bytes of the K key stream participate. It is important that the two substitutions A and B are strongly non-linear in the sense that there is no a priori correlation between any subsets of the bits of x and A(x) or B(x).

       Since this autokey can be inverted, or reversed, the resulting L key stream will also be true-random. If it were not, then the inversion process would be a deterministic algorithm for producing true-random numbers from nonrandom numbers, which is impossible. By definition, random numbers generated by a deterministic algorithm are pseudo-random.

       There must be sufficiently many choices of A and B to prevent the correspondents from ever running out of keys. If A and B are determined by permuting the values 0 to 255 using a pseudo-random number generator, and this generator has a 40-bit seed, then there are 280 possible L key streams. Even if each L stream were used for only one second of communications this supply would last indefinitely. Indeed, if each new key were chosen randomly, rather than systematically, the chance of ever choosing the same L stream twice is negligible.

       A new key stream may be generated in bulk fashion using the entire K key stream, or it may be generated as needed using just the portion of the key stream required for the next message. For very long messages, the autokey can be applied to one block of the K stream at a time as it is retrieved from secondary storage, provided that message keys are not restricted to block boundaries.

EXHAUSTIVE TRIAL

       Since the K key stream is true-random, no method of cryptanalyzing L to recover K would work except exhaustive trial. In order to decide whether 280 possible choices of A and B are adequate for security, any known or potential exhaustive trial attacks should be examined.

       Assume that an opponent has a copy of the encryption device or software, and therefore knows the E operation. Suppose, further, that the opponent knows both the ciphertext and plaintext for some given message. This means that the E encipherment can be stripped off, yielding a portion of the L key stream. It is not feasible for an opponent to try all 280 combinations for the two substitutions, and even if that were possible, the K stream is true-random, and there is no way to distinguish the correct value.

       Imagine that the messages sent today will be of value 20 or even 30 years hence, when it might be possible to try 280 keys. The opponent would still need to have two different messages known to be enciphered with the same part of the key stream, and compare the 280 possible K streams for each one. A sample of 12 bytes from each would be sufficient. However, this requires storage on the order of 1025 bytes, which seems well beyond what can be projected even 30 years ahead. The autokey, therefore, seems secure against any conceivable attack that could be mounted in the foreseeable future. Of course the key size should be reviewed periodically as computers get larger and faster.

       The autokey method allows a relatively small amount of true-random key material to stretch a long, long way. Hamburger helper.

       One warning is necessary: the one-time pad method still depends on the opponent not knowing where in the key stream the key for a specific message begins. The key stream needs to be long enough to make this non-trivial. If P is the peak amount of message material expected during the time period when an L key is in use, Q is the maximum length of any message, R is the maximum number of messages expected during the period, and G the average gap left between messages in the key stream, then a good rule of thumb would be to make the key stream at least P+Q+RG+106 bytes long.

N-WAY COMMUNICATION

       This technique can be extended to N-way communications by using a layered approach. Suppose that there are N parties (people or stations) that wish to communicate, and that all possible pairs of parties may potentially need to exchange messages at some time.

       It is not sufficiently secure simply to supply all of the participants with the same true-random key stream, and let them apply the autokey method above to generate more key material. Two different parties may use exactly the same key and autokey, thus creating two messages with the same effective key. Such messages will begin appearing when the number of L streams that have been used reaches about 1.4x240. It is considered feasible to record and store 1.4x240 messages. An opponent monitoring all pairs of correspondents may detect this, and recover part of the effective key, that is, the L key. This portion might solve messages between these or other pairs.

       One solution is to control carefully which keys each pair of correspondents use to generate their L streams. This poses a huge administrative burden if the number of users is high, especially if users frequently join and leave the system. Worse, if the range of keys alloted to a particular user pair becomes known, then exhaustive trial of that limited set may be feasible, perhaps even easy.

       A better solution is to use a layered approach. Each pair of correspondents could have a unique autokey used to produce their L key stream, then they would apply a second autokey to produce the key stream for a specific time period. This way, neither the K key stream itself nor any derived L key stream is ever used as the key of a message.

       If the K stream were discovered by espionage, that would not compromise the communications of any pair. Similarly, no individual with legitimate access to the K stream could eavesdrop on any of the other parties.

       If the L stream for a particular pair of parties, or even for several pairs were discovered by espionage, that would not compromise the communications of any other pair, because their L stream would not be known. Futhermore, it would not compromise the communications of the affected pairs because it would not reveal their key streams for any given time period.

       Using this layered approach, a small amount of random key can serve a large number of correspondents over an indefinite period. Loaves and fishes. In fact, there are so many choices for the A and B substitutions that the key can never be exhausted, no matter how many parties join the network. Burning bush.

CONCLUSIONS

       This paper has shown a practical method for implementing one-time pad cryptography. Starting with a limited supply of a true-random key, a simple autokey cipher is used to generate additional random key material indefinitely. Using a layered approach, the same random key can supply any number of correspondents indefinitely with message keys for use in a one-time pad. A break in the communications of one or more pairs of correspondents will not compromise the security of any other pairs.


REFERENCE
1 Ritter, Terry 1991. The Efficient Generation of Cryptographic Confusion Sequences. Cryptologia 15: 81-139.