Timothy Brian Brock 1
Date: April 24, 2006
This paper will discuss the history of linear feedback shift registers (LFSR) in cryptographic applications and will attempt to implement an algorithm in SAGE [11] and Python (www.python.org) to create a linear feedback shift register sequence (LFSR sequence) in cryptography. Also, this paper will attempt to implement the Berlekamp Iterative Algorithm in SAGE and Python. This algorithm will be able to use the Linear Feedback Shift Register sequence generated by the first algorithm to find the sequence's connection polynomial.
I will attempt to show that the connection polynomial of a given LFSR
sequence is the reverse of a generator polynomial of the cyclic code
of length
, where
is also the period of the LFSR sequence.
This will provide a connection between cyclic error-correcting codes
and LFSR sequences.
A linear feedback shift register can be easily implemented in hardware or software and is used to create a pseudo-random sequence of numbers for many different applications. These applications include uses in consumer electronics, such as cellphones and digital cable [2]; multiple access and polling techniques; secure and privacy communications; error detecting and correcting codes; and cryptographic systems [1].
In consumer electronics, a Linear Feedback Shift Register can be used as a counter [2]. When used in this manner, LFSRs are desirable because they perform the function with less resources and usually much faster than the conventional counters, such as binary counters or Gray Code counters. Though it goes against intuition, LFSRs can also be used to generate pseudo-noise which is used by such consumer electronics as cellphones and digital cable to increase the reliability of the signal [2]. LFSR can also be used in spread spectrum systems [2]. A spread spectrum system utilizes the entire bandwidth a signal may use to send information by spreading the data frequency over many frequencies in the bandwidth. The next frequency to be utilized is determined by the LFSR sequence. Other more common applications of a LFSR is the use in white noise machines (such as the one shown below) and music synthesizers.
Another application of LFSRs is the creation of pseudo-random sequences that can be used in cryptography. According to the Merriam-Webster Dictionary, cryptography is the enciphering and deciphering of messages in secret code or cipher. The LFSR sequence is a pseudo-random number sequence that can be applied to a message as a cipher, a cipher is the system used to create a secret message. Throughout this paper, the cipher will be a sequence of binary terms that is added to the binary message to provide the encoded message, also known as the ciphertext. The cipher encodes the message so that only someone with the key knows the proper way to decode the message and is then able to read the message, anyone without the key receives the ciphertext and reads only nonsense. The key is a piece of information that allows a user to determine the specific cipher used in encrypting the message. In digital communication, the enciphering of a message with a LFSR sequence is the same as adding in pseudo-random noise. The proper recipient with a key removes the noise from the message, but a third party without the key interprets the message only as noise.
A linear feedback shift register sequence is a pseudo-random sequence
of numbers that is often created in a hardware implementation of a
linear feedback shift register. A LFSR is ``an algorithm which
yields a sequence of numbers which is eventually periodic'' [9].
When a LFSR is implemented in hardware, a LFSR sequence is recursively generated
by taking the output from the last ``stage'' of a given LFSR to compute the next ``stage.'' An example
of a LFSR implemented in hardware is included in Figure 1 (from Massey [4]). This LFSR
is of length
, and each state cell's current state is used as the input
to the mod 2 adder. This adder is implemented in hardware with an exclusive-or function.
Since this is a shift register, each iteration of the
register causes the state of each state cell to shift to the next cell (in
this case, to the right). We use the output of the last state cell to provide
the next term of the sequence after each iteration.
This hardware LFSR can be modeled mathematically to generate a LFSR sequence.
In order to build this sequence, three pieces of information are needed.
They are the key, the initial fill, and an algorithm to obtain the
next term of the sequence. In the hardware implementation, the connections between the state
cells and the mod 2 adder determines how the outputs of the
cells are used as inputs to the mod 2 adder. In the same way, the key determines how the previous
terms of the LFSR sequence are used to compute the next term in the sequence. The key may be represented
as a vector
, but is more often
defined by a polynomial, known as the connection polynomial
According to Massey [4], the initial fill is the initial values of the state cells,
the initial contents of the
stages of Figure 1 above.
In general, the LFSR sequence is defined by the following recursion relation
Imagine that Alice and Bob are sending messages back and forth to one another. If the content of these messages was not very secret and Alice and Bob did not care if anyone else read the message, they would not bother with any kind of encryption. Then, if a third person, say Charlie, intercepts the message he can read it without any difficulty. If, on the other hand, Alice and Bob were exchanging information they wanted to keep secret, they would need to employ some kind of encryption system to encode their messages. Depending upon the type of encryption they employ, Alice and Bob would receive a certain level of security against the actions of third parties.
No matter what type of encryption Alice and Bob use, there will be several objectives that Alice and Bob want the encryption method to achieve. Out of a long list objectives, there are four that form a framework upon which all the others are built [5]. The four security objectives that would apply to this message include:
A LFSR sequence stream cipher achieves secrecy by adding the sequence to the binary representation of the message. The sequence appears as added noise to the message and anyone who intercepts the message would be unable to read the information due to the added sequence. The problem with relying on LSFR sequences for secrecy is the minimal connection polynomial of the sequence is easily determined using the Berlekamp-Massey Algorithm, which is discussed later. The minimal connection polynomial is the key necessary to generate the LFSR sequence used as the stream cipher. With this key, Charlie can easily add the sequence to the ciphertext to retrieve the original message. One method Alice and Bob can use to have more secrecy in the stream cipher is by picking LFSR sequences with extremely long periods (i.e. period length
). With long period lengths, Charlie needs a longer amount of time to find the minimal connection polynomial and he needs more terms of the sequence to determine the correct polynomial.
As presented in this paper, LFSR sequences combined with some error-correcting codes can provide a limited amount of integrity. While this would be most useful against the attacks of noise in the transmission lines, if Charlie was desperate enough to attempt to try to flip random bits in the hopes of changing the message, the error-correcting code may provide some protection against this by correcting the errors intentionally made to the message. The actual amount of protection provided by this, however, is almost negligible since the error-correcting codes can only correct a limited number of errors and a malicious attack would probably be able to overwhelm the capabilities of an error-correcting code very easily. On its own, a LFSR sequence provides very little integrity.
In order to provide for authentication, Alice and Bob could each create their own stream cipher using a different LFSR sequence. Alice would then send her key and initial fill to Bob and Bob would send his key and initial fill to Alice. Alice would then add together her sequence and Bob's sequence to obtain a new sequence. She would use this sequence as the stream cipher and encode her message to send to Bob. Bob would also add together the two sequences to obtain the same stream cipher. This would allow Alice and Bob to verify that they are talking with each other since they created their stream cipher by adding their own individual sequences.
A stream cipher is a sequence of binary digits called a cryptographic bit-stream [7]. The stream cipher is then added to the message to create a ciphertext (encryption) and can be added to the ciphertext to obtain the original message (decryption). One type of stream cipher that is used to create ciphertexts is pseudo-random binary sequences. Linear feedback shift register sequences are one type of pseudo-random binary sequence that are easily generated by linear feedback shift registers.
Ideally, a sequence would have infinite length and could achieve complete randomness. The reality of practical application and construction techniques necessitates the use of only finite sequences. Since finite sequences can never be truly random, there are certain properties singled out that are associated with randomness. Golomb states these properties are:
when Any sequence that has these three properties is considered to be pseudo-random. We find that linear feedback shift register sequences fulfill the above three properties and are pseudo-random. Linear Feedback Shift Registers are also easy to use as stream ciphers since adding the LFSR sequence to the message encodes it and adding the sequence again to the ciphertext returns the original message [9].
Suppose Alice wanted to send the message
``BEAT ARMY!'' to Bob, but she wanted to keep it secret. By converting
``BEAT ARMY!'' from english to binary, we have a string of binary bits
When Bob receives the ciphertext, however, he is able to decrypt it since he has the stream cipher that he and Alice created earlier. By adding the stream cipher to the ciphertext, he will uncover the original message.
We recall some background definitions from McGowan [12]
Cyclic codewords are conveniently represented as polynomials modulo
. In
fact, if
then let
Denote by
the ring of polynomials with coefficients in
modulo
:
In fact GUAVA allows you to easily pass back and forth between codewords as vectors and codewords as polynomials.
In order to define the generator polynomial of a cyclic code we need the following mathematical fact.
proof: Suppose not. Let
be a non-zero element in
of smallest degree. Pick an
arbitrary non-zero element
in
. By the division algorithm, we can write
where
and
are polynomials and the degree of
is
strictly less than the degree of
. Notice that
belongs
to
by definition. This contradicts the assumption that
has smallest
degree unless
. Therefore every element of
is a multiple of
.
Take
.
It is also necessary to define what the check polynomial of a cyclic code is.
The check polynomial is used to determine whether or not a codeword belongs to
.
In section 1.3, it was shown that LFSR sequences can be described
with two vectors and a recursive relation. Another way to describe
the LFSR sequence is using a polynomial. The connection polynomial,
, describes
the LFSR sequence
As recalled above, a cyclic code can also be described by a polynomial, namely the generator polynomial. In practice, a cyclic code can be generated by a shift register for actual application in communication and computers [8]. This statement implies that LFSR sequences are in some way connected to cyclic codes. Before I show the connection with cyclic codes, however, I will define the reverse of a polynomial.
The basic idea behind decoding a LFSR sequence is to find the key.
The generator function of the LFSR sequence
is a power
series
It is a well known theorem that a real number is rational if and only if its sequence of decimal digits is eventually periodic [10]. The analog for this theorem for a power series is the following result.
This lemma immediately implies the following result.
In some cases, it is necessary to recover the connection polynomial of a LFSR sequence from the sequence. This is true when attempting to do cryptanalysis on a piece of intercepted code. When a part of the stream cipher is intercepted, the connection polynomial can be recovered even if the number of terms of the cipher is less than the period of the sequence. Once this polynomial is known, the entire cipher can be generated and any messages that are encrypted using that particular sequence as a stream cipher can be decrypted and read by the third party.
An algorithm exists that readily provides a connection polynomial
given only a few terms of a LFSR sequence. This algorithm is known
as the Berlekamp-Massey algorithm. In my studies, I have been able
to determine the connection polynomial of a LFSR sequence of period
15 with only 8 terms of the sequence. This can be generalized since if we
know that a sequence has a minimal connection polynomial with degree
,
then only
terms of the sequence need to be known in order to determine
the correct connection polynomial. We can determine
if we know the period
length of the sequence, since the period
[14]. This is an extremely powerful
tool for cryptanalysts trying to break stream ciphers generated from
LFSRs, since only a relatively small sample of a long period sequence
is needed to break the cipher. The algorithm as it is described by
James Massey is presented below [4]:
Input: a LFSR sequence of length
, where
is even.
Output: a connection polynomial
of the minimal LFSR.
Though an implementation of this algorithm already exists in SAGE and Python, we believe that it is possible to implement this algorithm more efficiently. It is hoped that a faster implementation will be created for SAGE and Python for this algorithm in this project. This algorithm is analyzed further in [3].
Of course, when performing cryptanalytic work on an intercepted enciphered message, one may not always
have all the information available about the stream cipher to make breaking the code as easy as I have just shown.
Prior to beginning the example, it was mentioned that for a LFSR sequence of known period length
, one can easily determine how
many terms of the sequence were necessary to find the minimal connection polynomial. Further research will look at what to do when the full period length of the sequence is unkown.
"""
Linear feedback shift register (LFSR) sequence commands
Stream ciphers have been used for
a long time as a source of pseudo-random number generators.
S. Golomb [G] gives a list of three
statistical properties a
sequence of numbers ${\bf a}=\{a_n\}_{n=1}^\infty$,
$a_n\in \{0,1\}$, should display to be considered
``random''. Define the {\bf autocorrelation} of
${\bf a}$ to be
\[
C(k)=C(k,{\bf a})=\lim_{N\rightarrow \infty}
{1\over N}\sum_{n=1}^N (-1)^{a_n+a_{n+k}}.
\]
In the case where ${\bf a}$ is periodic with
period $P$ then this reduces to
\[
C(k)={1\over P}\sum_{n=1}^P (-1)^{a_n+a_{n+k}}.
\]
Assume ${\bf a}$ is periodic with period $P$.
\begin{itemize}
\item[] {\bf balance}: $|\sum_{n=1}^P(-1)^{a_n}|\leq 1$.
\item[] {\bf low autocorrelation}:
\[
C(k)=
\left\{
\begin{array}{cc}
1,& k=0,\\
\epsilon, & k\not= 0.
\end{array}
\right.
\]
(For sequences satisfying these first two properties,
it is known that $\epsilon=-1/P$ must hold.)
\item[] {\bf proportional runs property}:
In each period, half the runs have length $1$,
one-fourth have length $2$, etc. Moveover, there
are as many runs of $1$'s as there are of
$0$'s.
\end{itemize}
A {\bf general feedback shift register} is a map
$f:{\bf F}_q^d\rightarrow {\bf F}_q^d$
of the form
\[
\begin{array}{c}
f(x_0,...,x_{n-1})=(x_1,x_2,...,x_n),\\
x_n=C(x_0,...,x_{n-1}),
\end{array}
\]
where $C:{\bf F}_q^d\rightarrow {\bf F}_q$ is a given
function. When $C$ is of the form
\[
C(x_0,...,x_{n-1})=a_0x_0+...+a_{n-1}x_{n-1},
\]
for some given constants $a_i\in {\bf F}_q$, the
map is called a {\bf linear feedback shift register
(LFSR)}.
{\bf Example of a LFSR} Let
\[
f(x)=a_{{0}}+a_{{1}}x+...+a_{{n}}{x}^n+...,
\]
\[
g(x)=b_{{0}}+b_{{1}}x+...+b_{{n}}{x}^n+...,
\]
be given polynomials in ${\bf F}_2[x]$ and let
\[
h(x)={f(x)\over g(x)}=c_0+c_1x+...+c_nx^n+... \ .
\]
We can compute a recursion formula which allows us to rapidly compute
the coefficients of $h(x)$ (take $f(x)=1$):
\[
c_{n}=\sum_{i=1}^n {{-b_i\over b_0}c_{n-i}}.
\]
The coefficients of $h(x)$ can, under certain conditions on
$f(x)$ and $g(x)$, be considered ``random'' from certain statistical
points of view.
{\bf Example}:
For instance, if
\[
f(x)=1,\ \ \ \ g(x)=x^4+x+1,
\]
then
\[
h(x)=1+x+x^2+x^3+x^5+x^7+x^8+...\ .
\]
The coefficients of $h$ are
\[
\begin{array}{c}
1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, \\
1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, ...\ .
\end{array}
\]
The sequence of $0,1$'s is periodic with period
$P=2^4-1=15$ and satisfies Golomb's three
randomness conditions.
However, this sequence of period 15 can be ``cracked''
(i.e., a procedure to reproduce $g(x)$)
by knowing only 8 terms! This is the function of the
Berlekamp-Massey algorithm [M], implemented as
\code{berlekamp_massey.py}.
[G] Solomon Golomb, {\bf Shift register sequences},
Aegean Park Press, Laguna Hills, Ca,
1967
[M] James L. Massey, ``Shift-Register Synthesis and BCH Decoding.''
IEEE Trans. on Information Theory, vol. 15(1), pp. 122-127, Jan 1969.
AUTHOR:
-- Timothy Brock
Created 11-24-2005 by wdj. Last updated 12-02-2005.
"""
###########################################################################
# Copyright (C) 2006 Timothy Brock
# and William Stein <wstein@ucsd.edu>
#
# Distributed under the terms of the GNU General Public License (GPL)
#
# http://www.gnu.org/licenses/
###########################################################################
import copy
from sage.structure.all import Sequence
from sage.rings.all import is_FiniteField
def lfsr_sequence(key, fill, n):
r"""
This function creates an lfsr sequence.
INPUT:
key -- a list of finite field elements, [c_0,c_1,...,c_k].
fill -- the list of the initial terms of the lfsr sequence,
[x_0,x_1,...,x_k].
n -- number of terms of the sequence that the
function returns.
OUTPUT:
The lfsr sequence defined by $x_{n+1} = c_kx_n+...+c_0x_{n-k}$,
for $n \leq k$.
EXAMPLES:
sage: F = GF(2); l = F(1); o = F(0)
sage: F = GF(2); S = LaurentSeriesRing(F,'x'); x = S.gen()
sage: fill = [l,l,o,l]; key = [l,o,o,l]; n = 20
sage: L = lfsr_sequence(key,fill,20); L
[1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0]
sage: g = berlekamp_massey(L); g
x^4 + x^3 + 1
sage: (1)/(g.reverse()+O(x^20))
1 + x + x^2 + x^3 + x^5 + x^7 + x^8 + x^11 + x^15 + x^16 +
x^17 + x^18 + O(x^20)
sage: (1+x^2)/(g.reverse()+O(x^20))
1 + x + x^4 + x^8 + x^9 + x^10 + x^11 + x^13 + x^15 +
x^16 + x^19 + O(x^20)
sage: (1+x^2+x^3)/(g.reverse()+O(x^20))
1 + x + x^3 + x^5 + x^6 + x^9 + x^13 + x^14 + x^15 +
x^16 + x^18 + O(x^20)
sage: fill = [l,l,o,l]; key = [l,o,o,o]; n = 20
sage: L = lfsr_sequence(key,fill,20); L
[1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1]
sage: g = berlekamp_massey(L); g
x^4 + 1
sage: (1+x)/(g.reverse()+O(x^20))
1 + x + x^4 + x^5 + x^8 + x^9 + x^12 + x^13 + x^16 + x^17 + O(x^20)
sage: (1+x+x^3)/(g.reverse()+O(x^20))
1 + x + x^3 + x^4 + x^5 + x^7 + x^8 + x^9 + x^11 +
x^12 + x^13 + x^15 + x^16 + x^17 + x^19 + O(x^20)
AUTHOR:
-- Timothy Brock (11-2005, with code modified from Python Cookbook,
\url{http://aspn.activestate.com/ASPN/Python/Cookbook/}
"""
if not isinstance(key, list):
raise TypeError, "key must be a list"
key = Sequence(key)
F = key.universe()
if not is_FiniteField(F):
raise TypeError, "universe of sequence must be a finite field"
s = fill
k = len(fill)
L = []
for i in range(n):
s0 = copy.copy(s)
L.append(s[0])
s = s[1:k]
s.append(sum([key[i]*s0[i] for i in range(k)]))
return L
import copy
def autocorrelation(L,p,k):
"""
INPUT:
L -- is a periodic sequence of elements of ZZ or GF(2). L must have length >= p
p -- the period of L
k -- k is an integer (0 < k < p)
OUTPUT:
autocorrelation sequence of L
EXAMPLES:
AUTHOR: Timothy Brock (February 2006)
"""
L0 = copy.copy(L)[:int(p)]
L0=L0+L0[:int(k)]
L1 = [int(L0[i])*int(L0[i+int(k)])/p for i in range(p)]
return sum(L1)
def massey(s):
"""
INPUT:
s -- a sequence of elements of a finite field (F) of even length
OUTPUT:
C(x) -- the connection polynomial of the minimal LFSR.
This implements the algorithm in section 3 of J. L. Massey's article [M].
EXAMPLE:
sage: F = GF(2)
sage: F
Finite Field of size 2
sage: o = F(0); l = F(1)
sage: key = [l,o,o,l]; fill = [l,l,o,l]; n = 20
sage: s = lfsr_sequence(key,fill,n); s
[1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0]
sage: massey(s)
x^4 + x + 1
AUTHOR: Timothy Brock (24 Mar 2006)
REFERENCES:
[M] J. L. Massey, "Shift-register synthesis and BCH decoding,"
IEEE Trans. Inform. Theory, vol. IT-15, pp. 122--127, Jan. 19 69.
"""
FF = s[0].base_ring()
R = PolynomialRing(FF, "x")
x = R.gen()
C = R(1); B = R(1); m = 1; b = FF(1); L = 0; N = 0
##The above was initializing steps
while N < len(s):
if L > 0:
r = min(L+1,C.degree()+1)
d = s[int(N)] + sum([(C.list())[int(i)]*s[int(N-i)] for i in range(1,r)])
if L == 0:
d = s[int(N)]
if d == 0:
m += 1
N += 1
if d > 0:
if 2*L > N:
C = C - d*b**(-1)*x**m*B
m += 1
N += 1
else:
T = C
C = C - d*b**(-1)*x**m*B
L = N + 1 - L
m = 1
b = d
B = T
N += 1
return C
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -t 'T. Brock Math Honors Thesis 2005-2006' -split 0 brock-honorsthesis.tex
The translation was initiated by David Joyner on 2006-04-24