SI335: A Geometric View of Optimization

Reading

These notes.

Quiz questions

Work on Project 3/ ProblemSet 4

A Geometric view of optimization

Optimization problems generally consist of a set S of potential solutions and an objective function f, F:S → R, where "R" is the set of real numbers --- the goal being to find an element of S that yields the maximum/minimum value of f. Sometimes it's also convenient to add a constraint function C, C:S → {0,1}, and make the goal to find an element of S that yielding the maximum/minimum value of f amongst those elements satisfying the constraint C.

Example: For the LongPath problem on a graph with n vertices, S is the set of all sequences of distinct vertices, f maps a sequence to its length, and C(s) returns 1 if each successive vertex in the sequence is a neighbor of the vertex before it, and false otherwise.

Optimization has a clear geometric interpretation when S is a Euclidean space and f isn't too exotic. For example, if S = R² and f is a continuous differentiable or piecewise differentiable function, we have a nice picture like this:

Optimization means finding the point in the S-plane at which the f(S) surface has the maximum or minimum value.

There are many algorithms for optimization in this kind of setting. An algorithm that starts with an initial point x in S and trys to move from x in small steps searching for nearby points with better f-values is doing local optimization. The optimization is "local" in the sense of restricting itself to points near the initial point x. One very straightforward kind of optimization is Hill-Climbing. The idea in hill climbing is to move from an initial point in small steps always taking the step that increases f the most. When you've arrived at a point at which any small step decreases f, you're done. This will give you a local maximum (if you're restricted to a bounded region in S) but not necessarily a/the global maximum. Like this:

This is actually the big problem in optimization: finding local optima is easy, but finding the global optimum (or optima) is difficult ... you keep getting stuck in all those local optima. In fact, for NP-Hard problems the difficulty is that you may have exponentially many local optima to get stuck in!

So what have we learned? We have a geometric view of opimization: we're searching for peaks (or valleys) over the space S. We have a concept of local optimization, and we understand the pitfalls of local minima and maxima.

A Geometric view of optimization: Part II

What about something like the Travelling Salesman Problem (TSP)? How can we picture the space S of possible solutions? How can we do local optimization when there's no obvious way to talk about the distance between solutions? It turns out that people do think about optimization in the exact same way as in the nice Euclidean example above. The only real catch is that distance between solutions stays a bit fuzzy. What is not fuzzy, however, is how to determine what solutions are close to the current solution. Recall that when we talked about TSP we described the operator 2-OPT for generating new solutions from old solutions. Whenever you choose an operator or set of operators, they actually induce a definition of "nearby" solutions: solution x' is one step away from x if one operator application changes x to x'. The minimum number of operator applications to get from x to x' defines the distance between x and x', though this is such a difficult thing to try to compute that it's not really a useful idea. Local optimization usually means searching all solutions that are one operator application away from the current solution and choosing the solution with the best f value. In other words, we apply local optimization in a greedy manner!

Using 2-OPT, which recall is equivalent to taking a sub-path of vertices in the cycle and reversing the sub-path, solution X = a→b→c→d→e→f and solution X' = a→b→d→c→e→f are neighbors (all you do is reverse c→d), but solution X'' = a→c→b→d→f→e is not a neighbor because it is two reversals (i.e. two 2-OPT moves) away.

Seeing it this way, our "greedy + local optimization", i.e. taking a greedy solution and then repeatedly appling 2-OPT operations that improve the cycle until we get to a point that no 2-OPT operation reduces the cycle cost, is really just hill-climbing from the starting point provided by a greedy algorithm to a local optimum. With this picture in mind, we see the problem too: the local optimum we find may not be a global optimum.

What about applying "randomized greedy + local optimization" repeatedly? Well, we're just hoping that one of the starting points generated by the randomized greedy algorithm is close enough to a global minimum that our hill-climbing finds it instead of some sub-optimal local minimum.

Reiteration time: The "local" in local optimization refers to solution points that are reachable in few operator applications, not necessarily solution points that are near in more comfortable ways. For example, consider squares a and b on the chessboard below:

Me	Me	Me	Me	Me	Me
M	M	M	M	M	M
M	a	b	M	M	M
M	M	M	M	M	M
M	M	M	M	M	M

Are a and b near? Well that depends. If you are a king or a rook: yes. If you are knight you might say no, since it takes three moves to get from a to b. If you're a bishop you'd certainly say no, since one is unreachable from the other. It all depends what "operators" you have available for moving through the space. In our search for good solutions to NP-Hard optimization problems, we define the operators we use in local optimization, and in so doing we define what "local" really means.

Other optimization strategies: Overview

Various strategies for optimization differ in how they use the "operators" for local optimization, and how they make big "non-local" jumps to try to avoid getting permantly stuck in a

local optimum/minimum, or in a
plateau, which is a very hard to spell region in S-space over which the f-surface is flat, with constant height, or in a
ridge, which is a curve in S-space over which the f-surface has constant height

since all of these cause problems for approaches like hill-climbing. So here is a non-exhaustive list of techniques, but we'll try to cover these before the semester ends:

Random Restart Hill Climbing - This is basically just what we suggested last class: repeatedly start with random solutions using local optimization to improve each of them.
Local Beam Search - This is like doing k hill-climbing optimizations, but doing them simultaneously. For each of k solutions, we generate all neighboring solutions (i.e. reachable in a single operator application) and the k best of that huge pool of solutions become the new k solutions ... and we repeat.
Simulated Annealing - We have a solution x and loop; at each iteration choose a random move from x (remember move means an operator application). If the move improves x, we always commit to it, otherwise we only commit to it with probability p. This probability actually depends on how much worse the solution we'd be moving to really is, and on how many iterations of this loop we've been through. In particular, the probability of accepting the move decreases with the number of loop iterations and the badness of the move. Thus, early on you tolerate a lot of random exploration, which you hope will land you near a global optimum at some point, while later on you basically work on finding the local optimum in the area you've settled into. This approach is based on mimicing the annealing process in metallurgy.
Genetic Algorithms - This is another approach that mimics a natural phenomenon. Here the idea is to keep a population of k solutions, initially chosen randomly. At each step (or generation if we decide to roll with the analogy) we create a new population of k solutions by "mating" the most fit members of the population, i.e. those with the best f values. "Mating" here means that new solutions are built by combining two parent solutions. These new solution undergo "mutation" with some small fixed probability, where mutation is suspiciously like applying one of our operators for local optimization. How this mating is done is extremely problem-specific, and will require some explaining later. Genetic algorithms are extremely popular, but many critics feel that the appeal is in the metaphor, not in the utility of the approach. Still, genetic algorithms are easily the most popular of these approaches --- not just in computer science, but in all fields of science and engineering.