Forecasting with Tree-Based Models for Time Series
These articles are AI-generated summaries. Please check the original sources for full details.
Introduction
Decision tree-based models are versatile tools in machine learning, commonly used for classification and regression on structured data, but also applicable to time series data with appropriate feature engineering. This article details how to leverage decision trees for time series forecasting by extracting lagged features and rolling statistics from raw time series data.
Building Decision Trees for Time Series Forecasting
The article utilizes the monthly airline passengers dataset from the sktime library to demonstrate a practical approach to time series forecasting using decision trees. The core idea is to transform the time series into a supervised learning problem by creating features that represent past values and trends.
Key Insights
- Lagged Features: Creating lagged features allows the model to learn dependencies between past and present values.
- Rolling Statistics: Rolling mean and standard deviation prevent data leakage and capture trends in the time series.
sktimeLibrary: Provides convenient access to time series datasets for experimentation and model building.
Working Example
import pandas as pd
from sktime.datasets import load_airline
# Load the airline passenger dataset
y = load_airline()
# Function to create lagged features and rolling statistics
def make_lagged_df_with_rolling(series, lags=12, roll_window=3):
df = pd.DataFrame({"y": series})
for lag in range(1, lags+1):
df[f"lag_{lag}"] = df["y"].shift(lag)
df[f"roll_mean_{roll_window}"] = df["y"].shift(1).rolling(roll_window).mean()
df[f"roll_std_{roll_window}"] = df["y"].shift(1).rolling(roll_window).std()
return df.dropna()
# Create the feature dataframe
df_features = make_lagged_df_with_rolling(y, lags=12, roll_window=3)
# Split the data into training and testing sets
train_size = int(len(df_features) * 0.8)
train, test = df_features.iloc[:train_size], df_features.iloc[train_size:]
X_train, y_train = train.drop("y", axis=1), train["y"]
X_test, y_test = test.drop("y", axis=1), test["y"]
# Train a Decision Tree Regressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error
dt_reg = DecisionTreeRegressor(max_depth=5, random_state=42)
dt_reg.fit(X_train, y_train)
y_pred = dt_reg.predict(X_test)
# Evaluate the model
print("Forecasting:")
print("MAE:", mean_absolute_error(y_test, y_pred))
Practical Applications
- Demand Forecasting: Retail companies can use this approach to predict future product demand based on historical sales data.
- Pitfall: Ignoring data leakage by including future information in the feature engineering process can lead to overly optimistic performance estimates.
References:
Continue reading
Next article
Git and GitLab: Version Control and DevOps Platforms
Related Content
Cisco Released Cisco Time Series Model: Their First Open-Weights Foundation Model based on Decoder-only Transformer Architecture
Cisco's open-weight Time Series Model reduces MAE by 25% on observability benchmarks, leveraging multiresolution context for improved forecasting.
How AutoGluon Enables Modern AutoML Pipelines for Production-Grade Tabular Models with Ensembling and Distillation
AutoGluon streamlines production-grade tabular model development, achieving high accuracy with a 7-minute training time on the Titanic dataset.
Why Decision Trees Fail (and How to Fix Them)
Discover three common reasons why decision tree models fail and learn practical Python solutions to fix them, improving test RMSE scores.