Game Decision Models
Two-Player Adversarial Games¶
Up to this point, our search models have operated under the assumption that the agent is the sole force affecting the environment. We must now extend this concept to cover scenarios where external agents also instigate change. This includes:
Two-player adversarial games with full information.
Card games (which are significantly more complex).
Games of chance (which introduce probability and are harder still).
Example Game State¶
In this example, the student goes first, attempting to form a vertical line.
| P | S | S | P | S | P | P | S |
|---|---|---|---|---|---|---|---|
| S | P | P | S | S | S | S | P |
| S | P | S | S | S | S | P | S |
| P | S | S | S | S | P | S | P |
| P | S | S | S | S | P | S | S |
| S | S | S | S | P | S | S | S |
| S | S | S | S | P | P | P | P |
| S | P | S | S | S | S | S | P |
Defining the Search Space¶
To analyze these games, we define the search space as follows:
Initial State: The board position as specified by the game rules at the start.
Operators: The legal moves available in the game.
Goal: Any state that meets the terminal conditions defined in the rules (e.g., checkmate, full board).
The Minimax Strategy¶
We define two players: Max and Min.
Max goes first and aims to win (maximize the score).
Min plays to thwart Max’s plans (minimize Max’s score).
Max must devise a strategy to ensure victory. A strategy is a conditional plan: “First I will make move A. If Min responds with X, I will do Y; if Min responds with Z, I will do W.” This structure allows us to build an AND/OR tree representing the game flow.
The Minimax Algorithm¶
This logical structure forms the basis of the Minimax algorithm:
Generate the entire game tree of possible moves.
Evaluate the terminal states (leaves) and assign values (e.g., 0 for a tie, 1 for a win, -1 for a loss).
Backpropagate these values up the tree.
Result: The values eventually reach the root. This value represents the best outcome Max can expect, assuming both players play optimally.
Selection: Max chooses the move corresponding to the highest backed-up value.
Note: This relies on the assumption that the opponent plays perfectly. We also define a ply as one turn taken by one player.
Properties of Minimax¶
Time Complexity:
O(b^d)Space Complexity:
O(d)(assuming Depth-First Search is used).Optimality: If the game is finite and time allows for a full tree search, the algorithm is both complete and optimal.
The Problem of Scale¶
Searching the full tree is rarely feasible for complex games due to the sheer size of the state space.
Chess: Branching factor
b = 35. A typical game lasts ~50 plies (100 moves).Nodes to search:
2 * 10^154.Go: Branching factor
b = 360. Games often last ~90 plies (180 moves).The search space is astronomically large, exceeding standard calculation limits.
Limited Scope Search¶
To handle these scales, we modify the algorithm to cut off the search:
Search only to a fixed depth.
At each leaf node of this limited tree, evaluate the state using an evaluation function.
Back these heuristic values up the tree as if they were true terminal values.
The Evaluation Function The key to success is the quality of the evaluation function (e.g., for Checkers, Othello, Chess, or Mancala).
It must be a good predictor of the actual value of a state.
It must be computationally fast.
Weighted linear functions are often a good choice.
Quiescence (When to Stop)
If the evaluation function is unstable (e.g., in the middle of a piece exchange), we should not stop searching.
We should continue the search until the state is quiescent (stable), ensuring the evaluation is reliable.
Alpha-Beta Pruning¶
Theoretically, the deeper we search, the better the program performs. If we can eliminate (prune) parts of the Minimax tree that do not affect the final decision, we gain time to search deeper in promising branches. However, we must ensure we never prune a path that could lead to an optimal solution.
Definitions¶
Alpha (
a): The current best choice for Max among the values found so far on the path from the current node to the root.Beta (
b): The current best choice for Min among the values found so far on the path from the current node to the root.
The Pruning Rule¶
Stop searching the children of a node (prune the sub-tree) when b <= a.
The Algorithm¶
alpha-beta(State s)
ab-max(s, neg-inf, pos-inf)
ab-max(State s, val a, val b)
if s is leaf
return val(s)
for each child c in children(s)
v <- ab-min(c, a, b)
if v > a
a <- v
if b <= a
return a
return a
ab-min(State s, val a, val b)
if s is leaf
return val(s)
for each child c in children(s)
v <- ab-max(c, a, b)
if v < b
b <- v
if b <= a
return b
return b
Efficiency Analysis¶
The effectiveness of Alpha-Beta pruning depends heavily on the order in which nodes are examined.
Optimal Ordering: If nodes are ordered perfectly, the number of nodes searched reduces from
O(b^d)toO(b^{d/2}).This implies we can search twice as deep in the same amount of time.
Random Ordering: A typical random ordering results in
O(b^{3d/4}), allowing us to search roughly one-third deeper.Practical Approach: While perfect ordering is impossible (as it requires solving the game), clever application of problem-specific knowledge can bring performance within a constant factor of optimal.