Probability and Expectimax
Assumptions Thus Far¶
Up until this point, our exploration of games and search algorithms (such as Minimax) has focused on a specific subset of games. Specifically, we have looked at games that rely on two key assumptions:
Deterministic: If an agent takes a specific action, the result is known and certain. A move on the board always results in the same resulting state.
Perfect Information: The agent has access to the opponent’s complete state; there is no hidden information.
However, many real-world games do not fit these criteria. We must expand our algorithms to handle games that are:
Stochastic (Random): Outcomes are based on chance elements, such as rolling dice or drawing cards.
Partially Observable: The agent does not know the full state of the game (e.g., Poker, where an opponent’s hand is hidden).
Fundamentals of Probability¶
To handle stochastic environments, we must introduce concepts from probability theory.
Random Variables¶
A Random Variable represents an event with a set of possible outcomes. Examples include the roll of a die or the winner of a sports match.
Uncertainty vs. True Randomness¶
It is important to distinguish between true randomness and uncertainty.
Some events, like dice rolls, are (almost) truly random.
Other events, like the outcome of a soccer game, represent uncertainty. We classify them as random because we lack the information to predict them with complete confidence. For example, if we knew the precise position, air resistance, and force applied to a die, we could theoretically predict movement. Because we do not, we model it as a random variable.
Distributions¶
A probability distribution assigns a probability to each potential outcome. A fundamental rule is that the sum of probabilities for all possible outcomes must equal 1.
Expectation and the Expectimax Algorithm¶
Expectation¶
Expectation is a property of a distribution. When outcomes are numeric, the expectation represents the weighted average of those outcomes. To calculate the expected value, we multiply each outcome’s value by the probability of that outcome occurring and sum the results.
The formula for the expected value of a random variable is:
The Expectimax Algorithm¶
How do we adapt our search trees when random events occur? We utilize the Expectimax algorithm.
In a standard search tree, we alternate between Maximizing and Minimizing layers. In Expectimax, we introduce a new type of node: the Expectation Node (or Chance Node).
Function: Instead of taking the min or max of its children, an expectation node computes the weighted average (the expectation) of the children’s values.
Requirements: To compute this, the agent must know the probability distribution across all possible outcomes.
From this setup, it looks a lot like MiniMax with slight changes:
Root: Max node.
Depth 2: Expectation nodes (representing the chance event that follows a move).
Depth 3: Max nodes (if the agent moves again).
...
Leaves: At the leaf nodes, we apply the utility or board evaluation function.
Note: Probabilistic search can sometimes yield counter-intuitive results compared to strict Minimax. Strategies may favor a lower guaranteed score over a high-risk, high-reward option depending on the distribution.
Expecti-Minimax (Adversarial Games)¶
This approach can be applied to 2-player adversarial games involving chance, such as Backgammon. The tree structure alternates to account for the opponent:
Max Node (Agent move)
Chance Node (Dice roll/Environment)
Min Node (Opponent move)
Chance Node (Dice roll/Environment)
Regarding Pruning: While Alpha-Beta pruning is highly effective in Minimax, pruning in Expectimax trees is mathematically more complex. (See course problem set for details).
Advanced Concepts¶
Partial Observability¶
In games with hidden information, we cannot start from a known state.
We can model this by placing a Chance Node at the root of the tree to represent the uncertainty of the initial state.
Monte Carlo Methods: In scenarios with millions of possible states where computing the exact distribution is impossible, we use Monte Carlo simulations to estimate the probability distribution.
Multiplayer Games¶
When more than two players are involved, the “Zero Sum” property often no longer applies.
Instead of a single scalar value, we maintain a vector of goodness (one value per player).
Each node maximizes the component of the vector corresponding to the player whose turn it is.