Optimization has a clear geometric interpretation when S is a Euclidean space and f isn't too exotic. For example, if S = RExample:For the LongPath problem on a graph with n vertices, S is the set of all sequences of distinct vertices, f maps a sequence to its length, and C(s) returns 1 if each successive vertex in the sequence is a neighbor of the vertex before it, and false otherwise.

Optimization means finding the point in the S-plane at which the f(S) surface has the maximum or minimum value.

There are many algorithms for optimization in this kind of setting. An algorithm that starts with an initial point x in S and trys to move from x in small steps searching for nearby points with better f-values is doing

This is actually the big problem in optimization: finding local optima is easy, but finding the global optimum (or optima) is difficult ... you keep getting stuck in all those local optima. In fact, for NP-Hard problems the difficulty is that you may have exponentially many local optima to get stuck in!

So what have we learned? We have a geometric view of
opimization: we're searching for peaks (or valleys) over the
space S. We have a concept of *local* optimization,
and we understand the pitfalls of local minima and maxima.

Using 2-OPT, which recall is equivalent to taking a sub-path of vertices in the cycle and reversing the sub-path, solution X = a→b→c→d→e→f and solution X' = a→b→d→c→e→f are neighbors (all you do is reverse c→d), but solution X'' = a→c→b→d→f→e is not a neighbor because it is two reversals (i.e. two 2-OPT moves) away.Seeing it this way, our "greedy + local optimization", i.e. taking a greedy solution and then repeatedly appling 2-OPT operations that improve the cycle until we get to a point that no 2-OPT operation reduces the cycle cost, is really just hill-climbing from the starting point provided by a greedy algorithm to a local optimum. With this picture in mind, we see the problem too: the local optimum we find may not be a global optimum.

What about applying "randomized greedy + local optimization" repeatedly? Well, we're just hoping that one of the starting points generated by the randomized greedy algorithm is close enough to a global minimum that our hill-climbing finds it instead of some sub-optimal local minimum.

**Reiteration time:** The "local" in local optimization
refers to solution points that are reachable in few operator
applications, not necessarily solution points that are near
in more comfortable ways. For example, consider squares a
and b on the chessboard below:

Me | Me | Me | Me | Me | Me |

M | M | M | M | M | M |

M | a | b | M | M | M |

M | M | M | M | M | M |

M | M | M | M | M | M |

- local optimum/minimum, or in a
- plateau, which is a very hard to spell region in S-space over which the f-surface is flat, with constant height, or in a
- ridge, which is a curve in S-space over which the f-surface has constant height

- Random Restart Hill Climbing - This is basically just what we suggested last class: repeatedly start with random solutions using local optimization to improve each of them.
- Local Beam Search - This is like doing k hill-climbing optimizations, but doing them simultaneously. For each of k solutions, we generate all neighboring solutions (i.e. reachable in a single operator application) and the k best of that huge pool of solutions become the new k solutions ... and we repeat.
- Simulated Annealing - We have a solution x and loop; at each iteration choose a random move from x (remember move means an operator application). If the move improves x, we always commit to it, otherwise we only commit to it with probability p. This probability actually depends on how much worse the solution we'd be moving to really is, and on how many iterations of this loop we've been through. In particular, the probability of accepting the move decreases with the number of loop iterations and the badness of the move. Thus, early on you tolerate a lot of random exploration, which you hope will land you near a global optimum at some point, while later on you basically work on finding the local optimum in the area you've settled into. This approach is based on mimicing the annealing process in metallurgy.
- Genetic Algorithms - This is another approach that mimics a natural phenomenon. Here the idea is to keep a population of k solutions, initially chosen randomly. At each step (or generation if we decide to roll with the analogy) we create a new population of k solutions by "mating" the most fit members of the population, i.e. those with the best f values. "Mating" here means that new solutions are built by combining two parent solutions. These new solution undergo "mutation" with some small fixed probability, where mutation is suspiciously like applying one of our operators for local optimization. How this mating is done is extremely problem-specific, and will require some explaining later. Genetic algorithms are extremely popular, but many critics feel that the appeal is in the metaphor, not in the utility of the approach. Still, genetic algorithms are easily the most popular of these approaches --- not just in computer science, but in all fields of science and engineering.