Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Decision Trees

Strictly speaking, a Decision Tree (DT) is not a machine learning algorithm itself, but rather a decision tool for which various machine learning algorithms exist.

How a Decision Tree Works

In the testing phase, a series of questions are arranged in a tree structure. When an input comes in, the process starts at the root. You ask each question, draw the answer from the input, and move to the next question based on that answer. Here is an example tree:

Decision Tree Example

Each path through the tree represents a logical statement where every question serves as a predicate in a boolean formula:

These trees are highly expressive (capable of representing any boolean statement), compact (usually smaller than a truth table), and intuitive for human users.

Building the Tree

To generate a tree, you start with data, labels (e.g., yes/no or 0/1), and a set of possible predicates. Every example in the training set will have all its values filled, identifying features like reliability, price, color, speed, and RAM.

The Minimal Tree Concept

While a trivial method would be to build a path for every individual data item, this results in a massive tree that fails to generalize to new data. Instead, the goal is to find the minimal tree that still characterizes the data.

This is achieved by placing the “best” predicate at the root[cite: 108]. A perfect predicate is one that divides the data set into pure groups of just “yes” or just “no”.

Measuring Purity with Entropy

To measure the purity of a set, we use a concept from information theory called Entropy. It is calculated as:

E(S)=icpilog2piE(S) = \sum_{i \in c} -p_i \log_2 p_i

Where:

If a set is evenly divided, the entropy is 1; if the set is perfectly pure, the entropy is 0.

Information Gain

Once purity is measurable, we can calculate the Information Gain (GG) of splitting a set SS based on a predicate PP:

G(S,P)=E(S)vVpSvSE(Sv)G(S, P) = E(S) - \sum_{v \in V_p} \frac{|S_v|}{|S|} E(S_v)

This represents the entropy of SS minus the weighted sum of the entropies of the resulting sub-sets. We select the predicate that yields the highest information gain and repeat the process for each split set until a branch is pure. This ultimately determines the order of questions and shape of the tree.

Potential Challenges