next up previous contents


Background

This research will analyze an algorithm for integer factorization based on the use of continued fractions and quadratic forms, primarily intending to produce a runtime analysis of the algorithm but also proving several valuable results about continued fractions. This paper provides some background to the problem and a description of the algorithm.

There are several different kinds of factorization; this research will focus on integer factorization. Consider an integer $N$. Factorization is the process of finding integers $p$ and $q$ greater than $1$ such that $N = pq$. Complete factorization would require repeating this process for both $p$ and $q$ until all of the remaining factors are prime (i.e irreducible). However, if an algorithm can be developed to quickly factor $N$ into $p$ and $q$, the same algorithm can be used over again on $p$ and $q$. For example, it is easy to see that $105 = 5 \cdot 21$ and then repeat to factor $21$. From here, you would see that since $21 = 3 \cdot 7$, $105 = 5 \cdot 3 \cdot 7$. Although in this simple case, the complete factorization is easy to find, this task becomes much harder for large numbers. In number theory, the factors of a number dictate many of the characteristics of the number. For example, Euler's $\phi$ function, which tells how many numbers less than $N$ are relatively prime1 to $N$, can be directly calculated from the complete factorization. In the example above, there are $(5-1)(3-1)(7-1) = 48$ integers less than $105$ that are relatively prime to $105$. Also, determining whether a number is a quadratic residue (i.e. the square of another number modulo $N$), can be determined directly using Gauss's quadratic reciprocity law if the complete factorization of $N$ is known. In the above example, $19^2 = 361 = 46+3 \cdot 105$, so that $46$ is a quadratic residue modulo $105$ with square root $19$. One would represent this as $19^2 \equiv 46 \pmod {105}$. With the complete factorization, we can use the law of reciprocity to analyze $46$ modulo $3$,$5$, and $7$ to determine whether or not $46$ is a quadratic residue without actually having to find its square root first [Ga]. Therefore, the ability to factor large numbers has been a focus of research for a variety of theoretical reasons.

One practical application of factorization is cryptography. In one system of encrypting a message, the RSA system, the cryptographer chooses a number that is the product of two large numbers $p$ and $q$ that are presumed to be prime: $N = pq$. Then an exponent $e$ is chosen such that $e$ is relatively prime to $(p-1)(q-1)$. Although there are several variations, in the normal public key version, $N$ and $e$ are made public. The user of the RSA system then privately calculates $d$, the inverse of $e$ modulo $(p-1)(q-1)$. Anyone is able to encrypt something to him by raising blocks of the message to the power $e$, modulo N: $c \equiv m^e \pmod N$, where $m$ is the original message and $c$ is the encrypted message. Then, the recipient is able to decrypt by evaluating $m \equiv c^d\pmod N$ [Riv]. As long as someone intercepting the message is unable to factor $N$, it is usually impossible to obtain $d$, so that the message cannot be broken. The security of this system and its variations depends highly on whether or not $N$ can be factored [T]. Although fast factorization would be a threat to this system, the advance in number theory produced by fast factorization would likely provide a number of alternative secure systems.

In addition to the potential for alternative secure systems, there is also the possibility that a fast factorization algorithm might not work for all numbers. Therefore, there could be some numbers for which factorization might be easier than others. If there are classes of numbers that a fast factorization algorithm does not work on, this would allow designers of the algorithm to increase their security by relying more on these numbers. Regardless of whether or not the algorithm works for all numbers or provides alternative systems, for security purposes it is necessary to understand the strengths and weaknesses of the system.

Up to now we have referred to fast factorization in general terms, but there are several different ways to classify the speed of an algorithm. Let $N$ be the number to factor. Let $n = \log _{2}N$, the number of bits in $N$. An algorithm's run time is called ``exponential" if it increases exponentially with the number of bits $n$. ``Linear" refers to an algorithm where the time increases proportionally to the number of bits2. ``Polynomial" refers to an algorithm for which the time required is some polynomial function of $n$. Thus, linear time is a special case of polynomial time. There are some algorithms that fall in between polynomial and exponential time and are referred to as sub-exponential. Currently, the best general purpose factorization algorithm is the general number field sieve, with a runtime of $\exp (\frac{4}{3^{2/3}}n^{1/3}(\log n)^{2/3})$ [L]. The $\frac{1}{3}$ in the exponent of $n$ has a very significant effect on the runtime of the algorithm, as this determines that the algorithm is sub-exponential.

Many of the other factorization algorithms are important for theoretical reasons. One tool used by several different algorithms is the continued fraction expression for $\sqrt{N}$, where $N$ is the number to be factored. This expression is calculated recursively:


\begin{displaymath}
x_0 = \sqrt{N},\ b_0 = \left\lfloor x_0 \right\rfloor\ (the\ \textbf{floor}\ of\ x_0).
\end{displaymath}


\begin{displaymath}
\forall i \geq 1\ x_i = \frac{1}{x_{i-1}-b_{i-1}}\,\ b_i = \left\lfloor x_i \right\rfloor
\end{displaymath} (1)


\begin{displaymath}
\sqrt{N} = b_0 + \frac{1}{b_1+\frac{1}{b_2+...}}
\end{displaymath}

It is important to note that at each step in the continued fraction expansion, it is possible to define integers $P_i$ and $Q_i$ such that $x_i-b_i = \frac{\sqrt{N}-P_i}{Q_i}$, where $P_i^2 \equiv N \pmod {Q_i}$. Also, in the expansion of $\sqrt{N}$, $\forall i$, $(-1)^iQ_iQ_0$ is a quadratic residue modulo $N$. Both of these were proven in Hans Riesel [Rie] and are included in Theorem 1 of section 6. The sequence of $Q_i$'s are thus referred to as pseudo-squares. The sequence of $P_i$'s are referred to as residues, since they are produced at each step by reducing the fraction by some integer. Several algorithms, most notably the Morrison - Brillhart algorithm, rely on the quadratic residues in the sequence of pseudo-squares. In 1982 Daniel Shanks developed but never published3 an algorithm called Square Forms Factorization, or SQUFOF, which takes a more direct approach to using these quadratic residues.

Figure 1: SQUFOF algorithm
8cm % latex2html id marker 2180
\fbox{
\begin{minipage}{6cm}
\begin{tabbing}
123...
...extbf{common divisor}, is a nontrivial\\ factor.
\end{tabbing}
\end{minipage}}

For SQUFOF, when the algorithm encountered a perfect square, it started a new sequence from that term by taking the conjugate of the numerator and the square root of the denominator and then continued the expansion until a residue repeated consecutively (Figure 1). At this point, the repeated residue provided a factor of N.

For example, let $N$ be $1353$:

$x_0 = \sqrt{1353}$ $b_0 = 36$

\begin{displaymath}
x_1 = \frac{1}{\sqrt{1353}-36} = \frac{\sqrt{1353}+36}{57} = 1 + \frac{\sqrt{1353}-21}{57}
\end{displaymath}


\begin{displaymath}
x_2 = \frac{57}{\sqrt{1353}-21} = \frac{\sqrt{1353}+21}{16} = 3 + \frac{\sqrt{1353}-27}{16}
\end{displaymath} (2)

The second fraction in each step is found by rationalizing. At each step, the integers taken out are $b_i$ and the remaining fractions are between $0$ and $1$. After subtracting $b_i$ the remaining fraction is inverted to find $x_{i+1}$. As a point of reference, we have approximated so far that $\sqrt{1353} \approx 36+\frac{1}{1+\frac{1}{3}}$. SQUFOF stops here because $16$ is a perfect square. Taking the conjugate of the top and the square root of the bottom, we obtain:


\begin{displaymath}
x_0' = \frac{\sqrt{1353}+27}{4} = 15 + \frac{\sqrt{1353}-33}{4}
\end{displaymath}


\begin{displaymath}
x_1' = \frac{4}{\sqrt{1353}-33} = \frac{\sqrt{1353}+33}{66} = 1 + \frac{\sqrt{1353}-33}{66}
\end{displaymath} (3)

Here, since the residue $33$ is repeated, we quickly find that $33$ is a factor of $1353$. $1353 = 33 \cdot 41 = 3 \cdot 11 \cdot 41$. The explanation of this can be seen by applying cross multiplication of the two fractions on the left of (3) and simplifying. This produces the equation $N-P_i^2 = 1353 - 33^2 = 4 \cdot 66 = Q_iQ_{i+1}$, as in (a) of Theorem 1 in section 6. Since the residue $P_i = P_{i+1} = 33$ is repeated, it must have a factor in common with the pseudo-square $Q_{i+1} = 66$, because of the recurrence relation $P_{i+1} = b_{i+1}Q_{i+1}-P_i$ in (b) of Theorem 1 below. This common factor must then also divide $1353$.

Figure 2: Simple variation of composition of quadratic forms
8cm \fbox{
\begin{minipage}{6cm}
\begin{tabbing}
123\=45\=78\=01\=3456789\= \kill...
... = \frac{N-b^2}{a}$\\ \\
Result: $(a',b',-c')$
\end{tabbing}
\end{minipage}}

Shanks also showed that the second part of SQUFOF (the process of finding a factorization once the perfect square has been found) is linear using a process called composition of quadratic forms. Composition of quadratic forms has several different definitions. Originally, Gauss defined it as the process of multiplying two quadratic forms together and making a substitution to reduce the product to another quadratic form [Ga]. Therefore, I will use the multiplication symbol $(*)$ for composition. However, this computation is slow and complicated. Shanks developed an approximation to this that relies on an extended Euclidean algorithm and the Chinese remainder theorem (Figure 2). This is much faster and simpler and is useful for a variety of theoretical reasons. Shanks also developed two algorithms for performing a partial reduction before the composition was completed: NUCOMP and NUDUPL. NUCOMP composes two different quadratic forms while NUDUPL composes a quadratic form with itself [S]. These algorithms, although slightly faster, are too complicated to explain here.

From the second line of (2), observe that $21^2 + 57 \cdot 16 = 1353$. The quadratic form for this step would be $57x^2+2 \cdot 21xy-16y^2$, abbreviated $(57,21,-16)$. In general, the quadratic form for each step is $(Q_{i-1},P_i,-Q_i)$. Given a quadratic form $(a,b,-c)$, setting $x_0' = \frac{\sqrt{N}-b}{a}$ returns to the continued fraction expansion. The result of composition is another quadratic form similarly related to somewhere else in the continued fraction expansion. Regardless of the composition algorithm chosen, the results are within several steps of each other in the continued fraction expansion.

The results of composition are still often outside the bounds normally obtained in the continued fraction expansion, so it is often necessary to reduce this product further. Shanks followed an algorithm designed by Gauss that uses substitutions. However, this algorithm also sometimes reverses the direction of the quadratic form, a trait that is undesirable for this research for reasons that will become apparent later, so quadratic forms will instead be converted back into a step in the continued fraction expansion in order to reduce them. For example, if we compose $(57,21,-16)$ with itself using NUDUPL, we obtain $(31,50,37)$. In order to reduce this, we write out the step in the expansion that it would represent and proceed from there:


\begin{displaymath}
\frac{31}{\sqrt{1353}-50} = \frac{\sqrt{1353}+50}{-37} = -3 + \frac{\sqrt{1353}-61}{-37}
\end{displaymath}


\begin{displaymath}
\frac{-37}{\sqrt{1353}-61} = \frac{\sqrt{1353}+61}{64} = 1 + \frac{\sqrt{1353}-3}{64}
\end{displaymath}


\begin{displaymath}
\frac{64}{\sqrt{1353}-3} = \frac{\sqrt{1353}+3}{21} = 1 + \frac{\sqrt{1353}-18}{21}
\end{displaymath}


\begin{displaymath}
\frac{21}{\sqrt{1353}-18} = \frac{\sqrt{1353}+18}{49} = 1 + \frac{\sqrt{1353}-31}{49}
\end{displaymath}

In the fourth step, $0<49<2\sqrt{N} \approx 73$, $0<18<\sqrt{N} \approx 36$, and $\sqrt{N}-18 < 49$, so that the quadratic form is completely reduced4. The new quadratic form is found from the fourth step to be $(21,18,-49)$. This same step could be found in the 7th step of the continued fraction expansion if composition of quadratic forms were not used. The number of terms skipped is larger for a quadratic form farther down in the expansion and can be roughly approximated, so that if it is known which step in the process is desired, composition of quadratic forms gets close enough that the answer can be quickly found. Although the first phase of Shanks' SQUFOF algorithm, the process of finding a perfect square, still requires exponential time, the total time is roughly cut in half [Rie].


David Joyner, wdj@usna.edu 2004-02-23