What is a decision tree?
A Decision Tree is a predictive model that maps decisions and their possible outcomes in a tree-like structure
It is used for both classification and regression tasks.
Consider the following dataset. There are 2 features and yes or no as the target.
What is entropy?
Information gain measures an amount the information that we gain. It does so using entropy.
Subtract from the entropy of our data before the split the entropy of each possible partition thereafter.
We then select the split that yields the largest reduction in entropy, or equivalently, the largest increase in information
Let us build a decision tree step by step
Step 1: Calculate Root Node Entropy
Step 2: Calculate information gain for each feature
Step 3: Choose the best feature based on information gain
IG(root, weather) = 0.667
IG(root, temperature) = 0.208
Thus weather is the best feature becuase it has higher information gain.
Step 4: Split the data
Step 5: Find the best feature at the internal impure node
Step 6: Repeat the steps until…
Implementation in Julia
Import package
Statsbase is used for calculating the mode at the leaves.
using StatsBase
Dataset definition
We define three weathers using number 1, 2, 3 and three temperatures also using 1, 2, 3. So the 1st column is weather and 2nd column is temperature.
# Features: Weather (1=Sunny, 2=Rainy, 3=Overcast), Temperature (1=Hot, 2=Mild, 3=Cool)
# Labels: Play Outside (0=No, 1=Yes)
X = [
1 1;
1 2;
2 3;
2 2;
3 3;
3 2
]
println(typeof(X)) # Matrix{Int}
y = [0, 1, 0, 0, 1, 1] # Labels (Play Outside: No or Yes)
Function for entropy calculation
function entropy(labels)
n = length(labels)
if n == 0
return 0.0 # Entropy is 0 for an empty set
end
probs = [count(labels .== c) / n for c in unique(labels)]
return -sum(p * log2(p) for p in probs if p > 0)
end
Function for information gain calculation
In this code block, left_y is the left branch going from a node and right_y is the right branch.
function information_gain(X, y, feature_index, threshold)
# Perform element-wise comparison for splitting
left_indices = findall(X[:, feature_index] .<= threshold)
right_indices = findall(X[:, feature_index] .> threshold)
left_y = y[left_indices]
right_y = y[right_indices]
# Calculate entropy before the split
entropy_before = entropy(y)
# Calculate entropy after the split
n = length(y)
entropy_after = (length(left_y) / n) * entropy(left_y) + (length(right_y) / n) * entropy(right_y)
# Information gain is the reduction in entropy
return entropy_before - entropy_after
end
Find the best split
Best split requires finding the best features and best threshold. For example, feature 1 (weather) should be only sunny (value =1).
function best_split(X, y)
best_gain = -Inf
best_feature = -1
best_threshold = -1
n_features = size(X, 2)
for feature_index in 1:n_features
thresholds = unique(X[:, feature_index]) # Test each unique value in the feature
for threshold in thresholds
gain = information_gain(X, y, feature_index, threshold)
if gain > best_gain
best_gain = gain
best_feature = feature_index
best_threshold = threshold
end
end
end
return best_feature, best_threshold, best_gain
end
Code for building decision tree
The decision tree building stops when the depth exceeds maximum depth or entropy becomes zero or data points in a node equals 1 or 0. Also note that the build_tree function is called recursively inside the function itself until stopping criteria is reached.
function build_tree(X, y, depth=0, max_depth=3)
# Check stopping conditions
if entropy(y) == 0 || length(y) <= 1 || depth >= max_depth
# Return a leaf node with the majority class
return mode(y)
end
# Find the best split
feature_index, threshold, gain = best_split(X, y)
if gain == 0 # No further information can be gained
return mode(y)
end
# Split the dataset
left_indices = findall(Float64.(X[:, feature_index]) .<= Float64.(threshold))
right_indices = findall(Float64.(X[:, feature_index]) .> Float64.(threshold))
left_X = X[left_indices, :]
right_X = X[right_indices, :]
left_y = y[left_indices]
right_y = y[right_indices]
# Build the left and right subtrees recursively
left_tree = build_tree(left_X, left_y, depth + 1, max_depth)
right_tree = build_tree(right_X, right_y, depth + 1, max_depth)
# Return the decision node
return Dict(
:feature_index => feature_index,
:threshold => threshold,
:left => left_tree,
:right => right_tree
)
end
Train and print the decision tree
tree = build_tree(X, y)
# Print the decision tree
println(tree)