Why Decision Trees Fail (and How to Fix Them)

1. Overfitting: Memorizing the Data Rather Than Learning from It

Decision trees, while powerful, can fall into the trap of overfitting – memorizing training data instead of generalizing. This results in excellent training performance but poor performance on unseen data, as demonstrated by a California Housing dataset example where a tree without depth constraints achieved near-zero training error but a test RMSE of 0.727.

Why This Matters

Real-world data is rarely perfectly representative. Overfitting leads to models that perform well in controlled environments but fail catastrophically when deployed, potentially costing significant resources due to incorrect predictions and the need for retraining.

Key Insights

Overfitting is common: Decision trees are prone to overfitting, especially with complex datasets.
Regularization is key: Constraining tree depth or minimum samples per leaf prevents overfitting.
Scikit-learn ease: Scikit-learn provides simple hyperparameters for controlling tree complexity.

Working Example

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Loading the dataset and splitting it into training and test sets
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Building a tree without specifying maximum depth
overfit_tree = DecisionTreeRegressor(random_state=42)
overfit_tree.fit(X_train, y_train)
print("Train RMSE:", np.sqrt(mean_squared_error(y_train, overfit_tree.predict(X_train))))
print("Test RMSE:", np.sqrt(mean_squared_error(y_test, overfit_tree.predict(X_test))))

# Pruning the tree
pruned_tree = DecisionTreeRegressor(max_depth=6, min_samples_leaf=20, random_state=42)
pruned_tree.fit(X_train, y_train)
print("Train RMSE:", np.sqrt(mean_squared_error(y_train, pruned_tree.predict(X_train))))
print("Test RMSE:", np.sqrt(mean_squared_error(y_test, pruned_tree.predict(X_test))))

Practical Applications

Fraud Detection: A decision tree overfit to historical transaction data might fail to identify new fraud patterns.
Pitfall: Ignoring hyperparameter tuning and allowing trees to grow unconstrained.

References:

https://machinelearningmastery.com/why-decision-trees-fail-and-how-to-fix-them/

On This Page

1. Overfitting: Memorizing the Data Rather Than Learning from It

Why This Matters

Key Insights

Working Example

Practical Applications

Continue reading

Related Content

Reading About o4-mini & o4-mini-high Made Me Rethink “Small” AI Models

Why Intent Prediction Needs More Than an LLM: A Behavioral AI Perspective

Mastering Edge AI Performance and Power on Android: Stop Guessing, Start Profiling