This research will analyze an algorithm for integer factorization based on the use of continued fractions and quadratic forms, primarily intending to produce a runtime analysis of the algorithm but also proving several valuable results about continued fractions. This paper provides some background to the problem and a description of the algorithm.
There are several different kinds of factorization; this research will focus on integer factorization. Consider an integer . Factorization is the process of finding integers and greater than such that . Complete factorization would require repeating this process for both and until all of the remaining factors are prime (i.e irreducible). However, if an algorithm can be developed to quickly factor into and , the same algorithm can be used over again on and . For example, it is easy to see that and then repeat to factor . From here, you would see that since , . Although in this simple case, the complete factorization is easy to find, this task becomes much harder for large numbers. In number theory, the factors of a number dictate many of the characteristics of the number. For example, Euler's function, which tells how many numbers less than are relatively prime1 to , can be directly calculated from the complete factorization. In the example above, there are integers less than that are relatively prime to . Also, determining whether a number is a quadratic residue (i.e. the square of another number modulo ), can be determined directly using Gauss's quadratic reciprocity law if the complete factorization of is known. In the above example, , so that is a quadratic residue modulo with square root . One would represent this as . With the complete factorization, we can use the law of reciprocity to analyze modulo ,, and to determine whether or not is a quadratic residue without actually having to find its square root first [Ga]. Therefore, the ability to factor large numbers has been a focus of research for a variety of theoretical reasons.
One practical application of factorization is cryptography. In one system of encrypting a message, the RSA system, the cryptographer chooses a number that is the product of two large numbers and that are presumed to be prime: . Then an exponent is chosen such that is relatively prime to . Although there are several variations, in the normal public key version, and are made public. The user of the RSA system then privately calculates , the inverse of modulo . Anyone is able to encrypt something to him by raising blocks of the message to the power , modulo N: , where is the original message and is the encrypted message. Then, the recipient is able to decrypt by evaluating [Riv]. As long as someone intercepting the message is unable to factor , it is usually impossible to obtain , so that the message cannot be broken. The security of this system and its variations depends highly on whether or not can be factored [T]. Although fast factorization would be a threat to this system, the advance in number theory produced by fast factorization would likely provide a number of alternative secure systems.
In addition to the potential for alternative secure systems, there is also the possibility that a fast factorization algorithm might not work for all numbers. Therefore, there could be some numbers for which factorization might be easier than others. If there are classes of numbers that a fast factorization algorithm does not work on, this would allow designers of the algorithm to increase their security by relying more on these numbers. Regardless of whether or not the algorithm works for all numbers or provides alternative systems, for security purposes it is necessary to understand the strengths and weaknesses of the system.
Up to now we have referred to fast factorization in general terms, but there are several different ways to classify the speed of an algorithm. Let be the number to factor. Let , the number of bits in . An algorithm's run time is called ``exponential" if it increases exponentially with the number of bits . ``Linear" refers to an algorithm where the time increases proportionally to the number of bits2. ``Polynomial" refers to an algorithm for which the time required is some polynomial function of . Thus, linear time is a special case of polynomial time. There are some algorithms that fall in between polynomial and exponential time and are referred to as sub-exponential. Currently, the best general purpose factorization algorithm is the general number field sieve, with a runtime of [L]. The in the exponent of has a very significant effect on the runtime of the algorithm, as this determines that the algorithm is sub-exponential.
Many of the other factorization algorithms are important for theoretical reasons. One tool used by several different algorithms is the continued fraction expression for , where is the number to be factored. This expression is calculated recursively:
It is important to note that at each step in the continued fraction expansion, it is possible to define integers and such that , where . Also, in the expansion of , , is a quadratic residue modulo . Both of these were proven in Hans Riesel [Rie] and are included in Theorem 1 of section 6. The sequence of 's are thus referred to as pseudo-squares. The sequence of 's are referred to as residues, since they are produced at each step by reducing the fraction by some integer. Several algorithms, most notably the Morrison - Brillhart algorithm, rely on the quadratic residues in the sequence of pseudo-squares. In 1982 Daniel Shanks developed but never published3 an algorithm called Square Forms Factorization, or SQUFOF, which takes a more direct approach to using these quadratic residues.
For SQUFOF, when the algorithm encountered a perfect square, it started a new sequence from that term by taking the conjugate of the numerator and the square root of the denominator and then continued the expansion until a residue repeated consecutively (Figure 1). At this point, the repeated residue provided a factor of N.
For example, let be :
The second fraction in each step is found by rationalizing. At each step, the integers taken out are and the remaining fractions are between and . After subtracting the remaining fraction is inverted to find . As a point of reference, we have approximated so far that . SQUFOF stops here because is a perfect square. Taking the conjugate of the top and the square root of the bottom, we obtain:
Here, since the residue is repeated, we quickly find that is a factor of . . The explanation of this can be seen by applying cross multiplication of the two fractions on the left of (3) and simplifying. This produces the equation , as in (a) of Theorem 1 in section 6. Since the residue is repeated, it must have a factor in common with the pseudo-square , because of the recurrence relation in (b) of Theorem 1 below. This common factor must then also divide .
Shanks also showed that the second part of SQUFOF (the process of finding a factorization once the perfect square has been found) is linear using a process called composition of quadratic forms. Composition of quadratic forms has several different definitions. Originally, Gauss defined it as the process of multiplying two quadratic forms together and making a substitution to reduce the product to another quadratic form [Ga]. Therefore, I will use the multiplication symbol for composition. However, this computation is slow and complicated. Shanks developed an approximation to this that relies on an extended Euclidean algorithm and the Chinese remainder theorem (Figure 2). This is much faster and simpler and is useful for a variety of theoretical reasons. Shanks also developed two algorithms for performing a partial reduction before the composition was completed: NUCOMP and NUDUPL. NUCOMP composes two different quadratic forms while NUDUPL composes a quadratic form with itself [S]. These algorithms, although slightly faster, are too complicated to explain here.
From the second line of (2), observe that . The quadratic form for this step would be , abbreviated . In general, the quadratic form for each step is . Given a quadratic form , setting returns to the continued fraction expansion. The result of composition is another quadratic form similarly related to somewhere else in the continued fraction expansion. Regardless of the composition algorithm chosen, the results are within several steps of each other in the continued fraction expansion.
The results of composition are still often outside the bounds normally obtained in the continued fraction expansion, so it is often necessary to reduce this product further. Shanks followed an algorithm designed by Gauss that uses substitutions. However, this algorithm also sometimes reverses the direction of the quadratic form, a trait that is undesirable for this research for reasons that will become apparent later, so quadratic forms will instead be converted back into a step in the continued fraction expansion in order to reduce them. For example, if we compose with itself using NUDUPL, we obtain . In order to reduce this, we write out the step in the expansion that it would represent and proceed from there:
In the fourth step, , , and , so that the quadratic form is completely reduced4. The new quadratic form is found from the fourth step to be . This same step could be found in the 7th step of the continued fraction expansion if composition of quadratic forms were not used. The number of terms skipped is larger for a quadratic form farther down in the expansion and can be roughly approximated, so that if it is known which step in the process is desired, composition of quadratic forms gets close enough that the answer can be quickly found. Although the first phase of Shanks' SQUFOF algorithm, the process of finding a perfect square, still requires exponential time, the total time is roughly cut in half [Rie].