Bayesian Belief Networks
Let’s explore a simple set of problems revolving around the everyday task of starting a car.

In a Bayesian network, we use nodes to represent variables—such as the Battery, Radio, Ignition, Oxygen, Gas, Starts, and Moves. We use arrows to indicate dependence, so the lack of an arrow between two variables implies they are independent.
Understanding Independence¶
A key concept here is conditional independence. Let’s look at the battery-radio-ignition triad. Imagine we know for a fact that the battery is fine; that gives us a certain belief about whether or not there will be ignition.
If we then turn on the radio, does knowing whether or not the radio works tell us anything additional about the ignition? No. Therefore, they are conditionally independent given the Battery, which we can write as:
Sometimes, multiple factors relate two variables. For instance, let’s add Wiring to our model. Look at the graph above, but add another top node next to Battery for Wiring. It has the same two arrows as Battery, connecting to Radio and Ignition. With this addition, knowing that the battery is good doesn’t make the radio and ignition independent anymore. Even if the battery is fine, a broken radio might suggest the wiring at the battery is bad, which would then tell us not to expect ignition. However, if we know that both the battery and the wiring are good, then the radio and ignition are once again independent:
Since that specific model is a bit of a stretch, let’s ignore wiring for now and look at the Battery-Ignition-Starts trio. Are there any independences there? If we know that the ignition is working, being told the battery works gives us no new information on whether the car starts. This is expressed as:
Now, what about the Ignition-Starts-Gas triad? Normally, Ignition and Gas are independent. But imagine you know the car didn’t start. That knowledge will lead you to beliefs about both the gas and the ignition. If I then tell you there was no gas, it changes your belief about the ignition, doesn’t it? After all, now that we know the likely cause of the failure to start, we should reset our belief on the ignition to the prior. Thus, Ignition and Gas are actually dependent when we know “Starts.”
The General Rules¶
We can generalize these observations into two main rules:
A variable is independent of its non-descendants given its parents.
A variable is independent of all other variables given its Markov Blanket. The Markov Blanket is the set containing the variable’s parents, its children, and its children’s parents.
Building and Calculating Networks¶
Where do the networks come from? We build them, of course, but the easiest way to build these networks is causally. This method generates networks with the fewest links and produces conditional probabilities that are the easiest for people to understand.
With these ideas and some assumptions, we can build a general system to answer questions when we have evidence. For example:
If the car doesn’t start and the radio won’t play, what is the probability that the battery is at fault?
The basic technique is to enumerate through all the possible situations that can occur using the rule:
We also note that for any conjunction:
By applying conditional independence, we can make the computation much more efficient. In our car example, I won’t show all the work with the independence assumptions, but this distribution can be reduced to:
You might be asking why some variables are summed and others not. The ones that are not summed are the given evidence variables. The car does not start (s) and the radio won’t play (r), so what’s the probability that the battery is broken (b)? Those three (s,r,b) all have values. The rest do not, so we sum over all of their possible values.
This reduction depends on the order of our variables, but we don’t have to worry about finding the correct order, because the correct ordering always leads to the fact that every variable is independent of every other variable when given their parents. This equation is just following the edges of the graph!
OK, I cheated a little bit. Notice how I suddenly dropped the distribution notation and switched to individual notation. That’s just to keep things clear because we were mixing distributions for B,R, and S, but individual outcomes for i,o,g,m. To calculate the Distributions, you repeat this computation for each of the possible values that the distribution (such as B) can take.
The term is a normalization constant. To calculate the distributions, you repeat this computation for each possible value the distribution can take (like Battery being true or false):
. So all we do is calculate
and then
and then is the number that makes these add up to 1.
Complexity and Efficiency¶
To avoid re-computing the same values multiple times, we use dynamic programming or memoization to save results in a table. Typically, we evaluate these from right to left, starting with the leaf nodes .
Time Complexity¶
The time complexity of the algorithm depends on the structure of the network:
Polytrees (no cycles): Time complexity is linear with the size of the network.
Undirected cycles: Time is exponential (NP-hard).
Directed cycles: This is not well-formed, and the probability values would not be stable. Don’t do this!
Because exponential growth is slow, an important technique is to perform approximate inference by statistical sampling. While it doesn’t give you the exactly correct answer, it gets close and can be much faster. We won’t cover them here, but you should know they exist.
Application Problems¶
What sort of problems can you be asked for this topic?
I could give you a set of variables, and ask you to build a graph that makes sense. For example I could say, “build a graph that describes starting a care, using the following variables”:
i. The battery is good
ii. The radio works
iii. The ignition works
iv. There is air getting to the ignition point
v. The car has gas
vi. The car can drive...and you should come up with the example we started with above.
I could give you a network and ask you for the equation to calculate the probability of some particular variable given some knowledge.
I could give you a network plus the probability tables and ask you for the actual probability of some variable taking some value given some knowledge.