Problem: Longest Common Subquence Given: X and Y sequences of length m and n respectively Produce: sequence Z that is a subsequence of both X and Y, and has maximum length among all common subsequencesWe'll talk about these as subsequences of strings (sequences of characters), since the visualizations are easier this way, but we accept arbtrary objects, as long as there is an equality test. So, for example,
LCS(holiday,echoed) = hodSo could we try to make a greedy algorithm for this? Well, there is a way to phrase solving the problem as a sequence of "moves". Here are the possible moves:
add&chop - add last element of X to solution, chop the last element off both sequences X-chop - chop the last element off X Y-chop - chop the last element off YTo get "hod" from our example, we might do
chop X, chop X, add-n-chop, chop X chop X chop Y, add-n-chop, chop Y, chop Y, add-n-chopThe problem is, however, how do we chose the right move? If the last two elements of X and Y are the same, the add-n-chop is clearly right. Otherwise, we certainly want either chop-X or chop-Y ... but which?
You can dream up some greedy strategies, but for every reasonable greedy choice you come up with, you will be able to find a counterexample. So, now what?
Algorithm: LLCS (Length of the Longest Common Subsequence) Input: X,m - where X is a sequence of at least m elements Y,n - where Y is a sequnece of at least n elements Output: k - the length of the longest common subsequence of X1,..,Xm and Y1,...,Yn if m = 0 or n = 0 return 0 else if X[m-1] = Y[n-1] return 1 + LLCS(X,m-1,Y,n-1) else return max( LLCS(X,m,Y,n-1) , LLCS(X,m-1),Y,n )This works, as you can see if you try running LLCS.cpp. However, you'll notice it's pretty slow. In fact, if you instrument it to count the number of calls to LLCS, you'll find that it gets out of hand.
$ ./go fourscoresevenyearsago hickorydickorydock The length of the longest common subsequence is 7 number of recursive calls is 46639471764That's 46 billion rucursive calls to compute this! What's going on?
What you should notice is that we end up computing the same things over and over. For example, LCS(ab,a) shows up three times, and we compute LCS(a,a) six times! On longer inputs, this really starts to add up!
T[i][j] = LLCS(X,i,Y,j)If LLCS(X,i,Y,j) hasn't been computed yet, we'll set T[i][j] = -1 to flag that for us. This gives us something like: LLCSmemo.cpp. If we augment this to track the number of recursive calls, things look a lot better!
$ ./gom fourscoresevenyearsago hickorydickorydock The length of the longest common subsequence is 7 number of recursive calls is 419That's about 111 million times faster. I'd call that a god day's work!
Recall that we've phrased the finding of a common subsequence as a sequence of "moves" to make. The problem is, we don't know ahead of time what the best "move" to make is at any given step. However, we discover what at each point is the best "move" as LLCSmemo runs. So what if we augment it to remember which move worked best? In particular, lets create a second table, E, to store that information in. We'll set it up like this:
E[i][j] = 1 if add&chop is the optimum move for X1,...,Xi and Y1,...,Yj = 2 if Y-chop is the optimum move for X1,...,Xi and Y1,...,Yj = 3 if X-chop is the optimum move for X1,...,Xi and Y1,...,Yj
How does this help? Well, we can start over and construct the optimum solution by choosing moves, except that now we have the E-table that tells us, for every scenario, what the optimum move is.An augmented version of the memoized algorithm that does all this is in lcslmemorecover.cpp.