MESSAGE AUTHENTICATION

MESSAGE AUTHENTICATION
USING QUADRATIC RESIDUES
Frank Rubin
January 31, 1995

Abstract: This paper shows a method for authenticating messages based on quadratic residues. The method will detect accidental or deliberate changes to a message, and will verify the sender of the message, both with near certainty. It does not require any preliminary exchange of messages, and does not require publishing any additional data besides each user's public key.

Keywords: cryptography, cryptology, digital signature, message authentication, prime number, factorization, quadratic residue, public key, checksum, hashing.

INTRODUCTION

       A recent issue of Cryptologia had an article by Shepherd, Sanders and Stockel [4] on the quadratic residue cipher. One of the drawbacks they mention for this cipher is that no method is known to obtain a digital signature using quadratic residues.

       In the same issue there was an article by Christoph Ruland [3] on digital signatures. The methods for digital signatures in that article require publishing a small amount of data for very short messages, like 1 or 2 bits, but the amount of published data increases greatly for longer messages, as does the amount of authentication data that must be transmitted with the message. The published authentication data can be used only once, and must be replaced for each successive message.

       The timing of replacing the authentication values is critical. If they are replaced before the receiver gets the message and retrieves the authentication data, then the receiver cannot know if the message is genuine. If they are replaced any later, then the receiver could pose as the sender, and send the message with its still-valid digital signature to a third party. Therefore the authentication values must be replaced at exactly the time that they are retrieved. This makes it impossible to prove to a third party later that a message is genuine.

       In a companion article [2], I describe a protocol for obtaining digital signatures using the quadratic residue cipher. This protocol involves sending a preliminary message from the sender to the intended recipient, receiving a reply, and then sending the authenticated message. It also depends on having the authenticated message enciphered with the receiver's public key in order to detect alteration of the message by an enemy.

       In this article I describe a method of authentication based on quadratic residues that does not require any preliminary messages, and does not depend on encryption. It does not require publishing any additional data besides each user's public key, and therefore does not require replacing authentication keys, or providing multiple keys for successive messages. The size of the authentication key does not depend on the length of the message. This authentication method may be incorporated into a public key cryptographic network with no additional files or system modifications. It could be incorporated into a conventional secret key cryptosystem, or into an unencrypted communications system by adding a universally accessible file containing each user's public authentication key.

DIGITAL SIGNATURES

       Suppose a sender S wishes to send a message M to a receiver R in a possibly hostile environment where enemies may intercept the message and perhaps alter it, substitute an entirely different message for the genuine message, or even insert bogus messages of their own into the message stream. In these circumstances, the receiver R needs to know two things, that the message was actually sent by the sender S, and that M is the genuine message, not an altered or substituted message. It is the function of a digital signature to provide this dual authentication.

       It is not adequate for S simply to append a special sequence of bits to the message, since an enemy could replace the genuine message, and copy the special bit sequence. Changing the special bits for each message in some preset order will not help, because the enemy could insert the bogus message in correct sequence into the message stream. To provide reliable authentication, the digital signature must depend on every bit in the message, so that even a single bit change could be detected.

       In the balance of this paper I will describe a digital signature which is achieved in two steps. First, the message is hashed in such a way that the hash value depends on every bit of the message. Then an authentication key is derived from the hash value in such a way that only the legitimate sender can determine the key, but any receiver can verify it.

QUADRATIC RESIDUES

       The quadratic residue of an integer M modulo N is simply the residue or remainder M' = M² (mod N). It has been shown by Blum, Blum and Shub [1] that successive quadratic residues, that is the sequence M², M⁴, M⁸, ... (mod N), generate a highly random and unpredictable sequence. If N = pq, with p and q both prime and =3(mod 4), and n is a quadratic residue modulo p, then ±n^(p+1)/4 (mod p) are the two square roots of n modulo p. Likewise for quadratic residues modulo q. Anyone who knows the factorization of N can now determine M from M' as follows: if M = a (mod p) and M = b (mod q) let p^* denote the multiplicative inverse of p modulo q, that is pp^* = 1 (mod q), then M = (b-a)pp^*+a (mod N). Since there are two possible values for both a and b there will be four possible square roots of M'. (Fewer if a or b is 0, meaning that M' is a multiple of p or q.)

       If N has been chosen large enough so that it cannot be factored, then anyone who does not know the factorization of N cannot determine M from M'. The largest general number that has been factored to date has 129 decimal digits. We will therefore assume that p and q each have at least 50 decimal digits, and that N has at least 130 decimal digits.

       In a public key cryptographic system using quadratic residues, each possible recipient has already chosen a value N meeting those criteria. (Actually, much stricter criteria. Some of these criteria will be introduced later.) This is called the recipient's public key, and is published, so that every user of the system has access to every other user's public key.

HASHING

       Hashing is any method of deriving a fixed-length value from a variable-length string. For example, the length of the string is a hash value, and the the sum of the bytes in the string is another hash. Hashing is often used to detect changes in a string. For example, if a string of bytes is transmitted, followed by the sum of the bytes in the string, then any change to a single byte would cause the sum to change, so the recipient would know there had been a transmission error. Such a simple hash, however, would not detect any transposition of message bytes, nor any change to several bytes whose sum was 0.

       A more sophisticated hash would be to divide the message into fixed-sized b-bit blocks B₁, B₂, B₃, ..., and then to compute the sum C₁B₁ + C₂B₂ + C₃B₃ + ..., where C₁, C₂, C₃, ... is a fixed sequence of b-bit pseudo-random coefficients. The hash value would be the low-order b bits of the sum. This hash will detect most any change in the message, including transpositions and multiple-block changes, with probability 1-2^-b. When b is large, this approaches certainty.

       There are two drawbacks to this hash method. First, storing the sequence of coefficients would take a lot of space, even if all users used the same coefficients. Second, this hash would not be safe against intentional alteration of the message, since an enemy could change one or more blocks of the message, then adjust some other block to cause the hash value to match. The required value is easily calculated.

       The problem of storage is easily solved. A standard method of generating pseudo-random numbers is to take consecutive powers of some integer modulo a given prime. In our case, N is not prime, but is the product of two large primes p and q. By putting some simple restrictions on p and q, we can get an adequate sequence. Consider the sequence D, D², D³, ... modulo N. The period of this sequence always divides í(N), the Euler phi function, which denotes the number of integers from 1 to N mutually prime to N. If p = 2p'+1 and q = 2q'+1 with p' and q' both prime, then í(N) = 4p'q'. The period of D will be either p'q' or 2p'q' unless D is congruent to 0, 1, or -1 (mod p) or to 0, 1, or -1 (mod q). Since p and q are so large, any D chosen at random is nearly certain to have a period of p'q' or 2p'q'. We may either publish the pair (N,D) for each user of the public key system, or else derive D in some systematic way from N. One possible choice would be to let D = 2^b where b+1 is the number of bits in N. This choice means that multiplication by D can be accomplished by shifting.

       The second problem, where the enemy changes the value of one block of the message to balance changes to other blocks, is solved by a careful selection of the hash function. We can use quadratic residues to construct a hash function that is resistant to such manipulation. Suppose that N is b+1 bits long. Break the message M into blocks M₁, M₂, M₃, ..., M_k of b bits each. The last block may be short. Now construct the following sequence of hash values:

(1)	H₀ = Q,
	H₁ = D·H₀ + B₁²	(mod N),
	H₂ = D·H₁ + B₂²	(mod N),
	...
	H_k = D·H_k-1 + B_k²	(mod N),
	H = H_k.

where Q is any large number such that neither Q nor DQ is close to a square, say within 10⁵⁰ of a square. For example, if D is always of the form 2^b, we might choose Q = 3²⁴⁹. This hash calculates the value

(2)

D^kQ + D^k-1B₁² + D^k-2B₂² + ... + DB_k-1² + B_k²

(mod N)

Now, suppose we have chosen D, computed the hash value, sent the message, and an enemy wishes to tamper with the message. Suppose, in particular, that the enemy wishes to change the value of block i, and then would like to change block j to make the hash value work out. The enemy can compute the required value of all terms in the expression (2), and therefore can determine the value of B_j². This does not allow the enemy to compute the value of B_j, however, since only the sender can do this.

There is one minor flaw that needs to be addressed. In general, there are 4 possible values for the block B_i that can produce the value B_i². One is B_i, two of them require knowledge of p and q to determine, but the fourth choice, N-B_i will be known to the enemy. It is conceivable that N-B_i is a meaningful message, and the enemy might desire to change B_i to N-B_i. To guard against this small exposure, we adopt a convention. Note that one of B_i and N-B_i must be less than N/2, while the other is greater than N/2. If B_i is greater than N/2, we will replace B_i² in the hash by N-B_i². If x is a quadratic residue, then N-x cannot be a quadratic residue. The desired hashing function is then

(3)	H₀ = Q,
	H₁ = D·H₀ + S₁	(mod N),
	H₂ = D·H₁ + S₂	(mod N),
	...
	H_k = D·H_k-1 + S_k	(mod N),
	H = H_k.

where

S_i =	B_i²	if B_i < N/2,
S_i =	N-B_i²	if B_i > N/2.

Since N is odd, B_i cannot equal N/2.

PROTECTING THE HASH VALUE

       The scheme outlined above will provide reliable authentication as long as there is a means to assure that the hash value is transmitted unchanged. One way is to have an alternate super-secure method of transmission, but this begs the issue, because then you could send all of your messages by the super-secure network, and not bother with any authentication methods.

       Instead, we would like the sender S to compute the square root of H modulo S's public key, and append this to the message. This would be the authentication key, or digital signature, for the message.

       When the message is received, the recipient R would calculate the hash values as above (3). R would then take the quadratic residue of the authentication key modulo the sender's public key. If this matches the hash value, then the message is authenticated. It cannot be faked, because an enemy has no way of computing the square root of the final hash value. It cannot be altered because any change to the message would alter the final hash value. Although an enemy can easily compute the hash value for any message, it is not feasible to obtain the corresponding key.

       Unfortunately, the situation is not this simple. The hash value might not have a square root modulo N. That is, the number H might not be a quadratic residue. A given quadratic residue will have 1 square root if it is a multiple of both p and q (i.e., it is 0), 2 square roots if it is a multiple of either p or q, but not both, and 4 square roots in the most common case, where it is a multiple of neither p nor q. This means that a given number has about a 1/4 probability of being a quadratic residue modulo N. Consequently, the sender may begin by attempting the square root of H. If this does not have a square root, then H+1 is tried, then H+2, and so forth. There is a 99.99% chance that one of the numbers from H to H+31, inclusive, will have a square root.

       The sender, therefore, will find the first number H+n which has a square root, and append that square root to the message. The recipient will square this authentication key, and check it against the hash value. If it agrees within some preset limit, such as 50, then the message is accepted as authentic.

       This gives an enemy a slight decrease in the amount of work needed to construct a bogus message. Suppose that the enemy picks a message, and computes H for that message. The enemy now needs to find a value V whose quadratic residue lies in the range H to H+49. On average, there will be 12.5 quadratic residues within this range, and each will have 4 possible square roots. So there will be 50 possible values for V. A search for 50 values in a search space of 10¹³⁰ is still infeasible.

       Regardless of the acceptance limit, if there is enough message traffic, then eventually a valid message will occur that would not be accepted. Fortunately, this would be detected by the sender prior to transmission. In such a case the message could be altered in some fashion, say by appending a null character. If the message contained a date/time stamp, then merely waiting until the time stamp changed would be sufficient.

       It might appear that an enemy could produce a bogus message by a square-root attack, as follows: First set B_k=0, and let V be the next integer above sqrt(H). Then choose B_k to be the integer part of sqrt(V²-H). (In this paragraph, sqrt indicates arithmetic square root, rather than square root modulo N.) This gives a new value of H which one might think could satisfy V²-H < 50. There is virtually no chance of this succeeding, for B_k will be on the order of sqrt(V), V is sqrt(H), and H is of the same order as N. So for N > 10¹³⁰, B_k will be on the order of 10³². Now V²-H can lie anywhere between B_k² and (B_k+1)², a span of 2B_k+1 or about 2x10³² numbers. The probability of V²-H lying in the range B_k² to B_k²+49 is thus about 50/(2x10³²) or less than 10^-30.

COMPOSITE AUTHENTICATION VALUE

       There is one danger remaining: that an enemy could duplicate the entire message and insert it into the message stream at another time. For this reason, we require that every message must be unique. In circumstances where identical messages might occur, uniqueness can be provided by including a message sequence number or a date/time stamp in each message, or by stepping H₀ in some systematic way.

       Another method of dealing with the problem of an enemy duplicating a legitimate message is to use a composite authentication value A in place of H. A could contain the identity of the sender, the identity of the recipient, a message sequence number, plus enough bits from H to assure that the enemy cannot tamper with the message before it reaches the receiver.

       80 bits is currently sufficient for this purpose. Even if an enemy could check 10¹⁰ messages per second, it would take an average of 19 million years to find a message which produced the same hash value. (The 80-bit hash size could increase over time, just as the size of N must increase as computers get faster and factoring algorithms are improved.) H could be folded to 80 bits by exclusive-oring 80-bit sections of it, starting from the low-order end.

       Finally, the composite authentication value A could have 6 low-order 0 bits. This would allow the process of stepping A, A+1, A+2, ... to leave the b-6 high-order bits unchanged. The probability that one of the values A through A+63 is a quadratic residue is about 0.99999998991. A 130 decimal digit modulus is large enough to allow two 19-character ID fields (for sender and receiver), a 4-byte message sequence number, a 10-byte hash value, and a low-order 0 byte.

       The sender steps the low-order byte of the composite authentication key until the entire key is a quadratic residue. Then the sender finds any of the four square roots of this key and appends that root to the message. The receiver squares this modulo N (the sender's public key), and compares each field to its expected value. The sender ID would have to correspond to the modulus N, the receiver ID must match the receiver, the message sequence number needs to be 1 greater than the last message sequence number from the same sender, and the hash value would have to match the value calculated by the receiver from the message.

       With this composite authentication value it is certain that the message comes from the purported sender, was sent to the alleged receiver, and has not been altered.

       Once the message has been received and authenticated, it must be securely logged and time-stamped so that the sender cannot subsequently publish the authentication key and then claim that the receiver had faked the message. The time stamp assures that the message was authentic when received.

CONCLUSION

       Quadratic residues can be used to obtain digital signatures by a two-step process, first hashing the message to get a hash value that depends on every bit of the message, and then taking an approximate square root modulo N to get an authentication key that cannot be produced by anyone except the legitimate sender. A composite authentication value guarantees that a message has been sent unchanged from the specified sender to the specified receiver.

REFERENCES

1	L. Blum, M. Blum and M. Shub 1986. A Simple Unpredictable Pseudorandom Number Generator. SIAM Journal on Computing 15(2): 364-383.
2	F. Rubin 1995. The Quadratic Residue and Double Quadratic Residue Ciphers. Cryptologia 19(3): 275-284.
3	C. Ruland 1993. Realizing Digital Signatures with One-Way Hash Functions. Cryptologia 17(3): 285-300.
4	S. J. Shepherd, P. W. Sanders and C. T. Stockel 1993. The Quadratic Residue Cipher and Some Notes on Implementation. Cryptologia 17(3): 264-282.