A Geometric view of optimization
Optimization problems generally consist of a set S of potential
solutions and an objective function f, F:S → R, where "R" is
the set of real numbers --- the goal being to find an element
of S that yields the maximum/minimum value of f. Sometimes
it's also convenient to add a constraint function C, C:S
→ {0,1}, and make the goal to find an element of S that
yielding the maximum/minimum value of f amongst those elements
satisfying the constraint C.
Example: For the LongPath problem on a graph with n
vertices, S is the set of all sequences of distinct
vertices, f maps a sequence to its length, and C(s) returns
1 if each successive vertex in the sequence is a neighbor of
the vertex before it, and false otherwise.
Optimization has a clear geometric interpretation when S is a
Euclidean space and f isn't too exotic. For example, if
S = R
2 and f is a continuous
differentiable or piecewise differentiable function, we have a
nice picture like this:
Optimization means finding the point in the S-plane at which
the f(S) surface has the maximum or minimum value.
There are many algorithms for optimization in this kind of
setting. An algorithm that starts with an initial point x in S
and trys to move from x in small steps searching for nearby
points with better f-values is doing
local optimization.
The optimization is "local" in the sense of restricting itself
to points near the initial point x. One very straightforward
kind of optimization is
Hill-Climbing. The idea in
hill climbing is to move from an initial point in small steps
always taking the step that increases f the most. When you've
arrived at a point at which any small step decreases f, you're
done. This will give you a
local maximum (if you're
restricted to a bounded region in S) but not necessarily a/the
global maximum. Like this:

This is actually the big problem in optimization: finding
local optima is easy, but finding the global optimum (or
optima) is difficult ... you keep getting stuck in all those
local optima.
In fact, for NP-Hard problems the difficulty is that you may
have exponentially many local optima to get stuck in!
So what have we learned? We have a geometric view of
opimization: we're searching for peaks (or valleys) over the
space S. We have a concept of local optimization,
and we understand the pitfalls of local minima and maxima.
A Geometric view of optimization: Part II
What about something like the Travelling Salesman Problem
(TSP)? How can we picture the space S of possible solutions?
How can we do local optimization when there's no obvious way
to talk about the distance between solutions? It turns out
that people do think about optimization in the exact same way as
in the nice Euclidean example above. The only real catch is
that distance between solutions stays a bit fuzzy. What is
not fuzzy, however, is how to determine what solutions are
close to the current solution. Recall that when we talked
about TSP we described the operator 2-OPT for generating
new solutions from old solutions. Whenever you choose an
operator or set of operators, they actually induce a
definition of "nearby" solutions: solution x' is one step away
from x if one operator application changes x to x'. The
minimum number of operator applications to get from x to x'
defines the distance between x and x', though this is such a
difficult thing to try to compute that it's not really a
useful idea. Local optimization usually means searching all
solutions that are one operator application away from the
current solution and choosing the solution with the best f
value. In other words, we apply local optimization in a
greedy manner!
Using 2-OPT, which recall is equivalent to taking a
sub-path of vertices in the cycle and reversing the
sub-path,
solution X = a→b→c→d→e→f
and solution X' = a→b→d→c→e→f
are neighbors (all you do is reverse c→d), but
solution X'' = a→c→b→d→f→e
is not a neighbor because it is two reversals (i.e. two
2-OPT moves) away.
Seeing it this way, our "greedy + local optimization",
i.e. taking a greedy solution and then repeatedly appling
2-OPT operations that improve the cycle until we get to a
point that no 2-OPT operation reduces the cycle cost, is
really just hill-climbing from the starting point provided by
a greedy algorithm to a local optimum. With this picture in
mind, we see the problem too: the local optimum we find may not be a
global optimum.
What about applying "randomized greedy + local optimization"
repeatedly? Well, we're just hoping that one of the
starting points generated by the randomized greedy algorithm
is close enough to a global minimum that our hill-climbing
finds it instead of some sub-optimal local minimum.
Reiteration time: The "local" in local optimization
refers to solution points that are reachable in few operator
applications, not necessarily solution points that are near
in more comfortable ways. For example, consider squares a
and b on the chessboard below:
| Me |
Me |
Me |
Me |
Me |
Me |
| M |
M |
M |
M |
M |
M |
| M |
a |
b |
M |
M |
M |
| M |
M |
M |
M |
M |
M |
| M |
M |
M |
M |
M |
M |
Are a and b near? Well that depends. If you are a king or a
rook: yes. If you are knight you might say no, since it takes
three moves to get from a to b. If you're a bishop you'd
certainly say no, since one is unreachable from the other. It
all depends what "operators" you have available for moving
through the space. In our search for good solutions to
NP-Hard optimization problems, we define the operators we use
in local optimization, and in so doing we define what "local"
really means.
Other optimization strategies: Overview
Various strategies for optimization differ in how they use the
"operators" for local optimization, and how they make big
"non-local" jumps to try to avoid getting permantly stuck in a
- local optimum/minimum, or in a
- plateau, which is a very hard to
spell region in S-space over which the f-surface is flat, with
constant height, or in a
- ridge, which is a curve in S-space over which the
f-surface has constant height
since all of these cause problems for approaches like
hill-climbing.
So here is a non-exhaustive list of techniques, but we'll try
to cover these before the semester ends:
- Random Restart Hill Climbing - This is basically just
what we suggested last class: repeatedly start with random
solutions using local optimization to improve each of them.
-
Local Beam Search - This is like doing k hill-climbing
optimizations, but doing them simultaneously. For each of
k solutions, we generate all neighboring solutions
(i.e. reachable in a single operator application) and the
k best of that huge pool of solutions become the new k
solutions ... and we repeat.
-
Simulated Annealing - We have a solution x and loop; at
each iteration choose a random move from x (remember move
means an operator application). If the move improves x,
we always commit to it, otherwise we only commit to it
with probability p. This probability actually depends on
how much worse the solution we'd be moving to really is,
and on how many iterations of this loop we've been
through. In particular, the probability of accepting the
move decreases with the number of loop iterations and the
badness of the move.
Thus, early on you tolerate a lot of random exploration,
which you hope will land you near a global optimum at some
point, while later on you basically work on finding the
local optimum in the area you've settled into.
This approach is based on mimicing
the annealing process in metallurgy.
-
Genetic Algorithms - This is another approach that mimics
a natural phenomenon. Here the idea is to keep a
population of k solutions, initially chosen randomly.
At each step (or generation if we decide to roll with the
analogy) we create a new population of k solutions by "mating" the most fit
members of the population, i.e. those with the best f values.
"Mating" here means that new solutions are built by
combining two parent solutions. These new solution
undergo "mutation" with some small fixed probability,
where mutation is suspiciously like applying one of our
operators for local optimization. How this mating is done
is extremely problem-specific, and will require some
explaining later. Genetic algorithms are extremely
popular, but many critics feel that the appeal is in the
metaphor, not in the utility of the approach. Still,
genetic algorithms are easily the most popular of these
approaches --- not just in computer science, but in all
fields of science and engineering.