Hierarchical Bayesian Regression with NumPyro: A JAX-Powered Workflow Guide
These articles are AI-generated summaries. Please check the original sources for full details.
A Coding Implementation of a Complete Hierarchical Bayesian Regression Workflow in NumPyro Using JAX-Powered Inference and Posterior Predictive Analysis
This tutorial demonstrates hierarchical Bayesian regression using NumPyro and JAX. It includes synthetic data generation, NUTS inference, and posterior predictive checks for 8 groups with 40 samples each.
Why This Matters
Hierarchical Bayesian models balance global trends with group-specific variations, but real-world data often introduces noise and non-linearities that challenge idealized assumptions. Failing to account for group-level heterogeneity can lead to overgeneralized inferences, while improper priors may cause posterior collapse. This workflow addresses these challenges through scalable JAX-powered inference and rigorous validation.
Key Insights
- “Hierarchical models capture group-level variations with global parameters”: Structured priors allow sharing information across groups while respecting individual differences.
- “Posterior predictive checks validate model fit”: Visual comparisons between observed and simulated data reveal discrepancies in assumptions.
- “NumPyro leverages JAX for scalable Bayesian inference”: JAX’s automatic differentiation and vectorization enable efficient NUTS sampling on large datasets.
Working Example
try:
import numpyro
except ImportError:
!pip install -q "llvmlite>=0.45.1" "numpyro[cpu]" matplotlib pandas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import jax
import jax.numpy as jnp
from jax import random
import numpyro
import numpyro.distributions as dist
from numpyro.infer import MCMC, NUTS, Predictive
from numpyro.diagnostics import hpdi
numpyro.set_host_device_count(1)
def generate_data(key, n_groups=8, n_per_group=40):
k1, k2, k3, k4 = random.split(key, 4)
true_alpha = 1.0
true_beta = 0.6
sigma_alpha_g = 0.8
sigma_beta_g = 0.5
sigma_eps = 0.7
group_ids = np.repeat(np.arange(n_groups), n_per_group)
n = n_groups * n_per_group
alpha_g = random.normal(k1, (n_groups,)) * sigma_alpha_g
beta_g = random.normal(k2, (n_groups,)) * sigma_beta_g
x = random.normal(k3, (n,)) * 2.0
eps = random.normal(k4, (n,)) * sigma_eps
a = true_alpha + alpha_g[group_ids]
b = true_beta + beta_g[group_ids]
y = a + b * x + eps
df = pd.DataFrame({"y": np.array(y), "x": np.array(x), "group": group_ids})
truth = dict(true_alpha=true_alpha, true_beta=true_beta,
sigma_alpha_group=sigma_alpha_g, sigma_beta_group=sigma_beta_g,
sigma_eps=sigma_eps)
return df, truth
key = random.PRNGKey(0)
df, truth = generate_data(key)
x = jnp.array(df["x"].values)
y = jnp.array(df["y"].values)
groups = jnp.array(df["group"].values)
n_groups = int(df["group"].nunique())
def hierarchical_regression_model(x, group_idx, n_groups, y=None):
mu_alpha = numpyro.sample("mu_alpha", dist.Normal(0.0, 5.0))
mu_beta = numpyro.sample("mu_beta", dist.Normal(0.0, 5.0))
sigma_alpha = numpyro.sample("sigma_alpha", dist.HalfCauchy(2.0))
sigma_beta = numpyro.sample("sigma_beta", dist.HalfCauchy(2.0))
with numpyro.plate("group", n_groups):
alpha_g = numpyro.sample("alpha_g", dist.Normal(mu_alpha, sigma_alpha))
beta_g = numpyro.sample("beta_g", dist.Normal(mu_beta, sigma_beta))
sigma_obs = numpyro.sample("sigma_obs", dist.Exponential(1.0))
alpha = alpha_g[group_idx]
beta = beta_g[group_idx]
mean = alpha + beta * x
with numpyro.plate("data", x.shape[0]):
numpyro.sample("y", dist.Normal(mean, sigma_obs), obs=y)
nuts = NUTS(hierarchical_regression_model, target_accept_prob=0.9)
mcmc = MCMC(nuts, num_warmup=1000, num_samples=1000, num_chains=1, progress_bar=True)
mcmc.run(random.PRNGKey(1), x=x, group_idx=groups, n_groups=n_groups, y=y)
samples = mcmc.get_samples()
def param_summary(arr):
arr = np.asarray(arr)
mean = arr.mean()
lo, hi = hpdi(arr, prob=0.9)
return mean, float(lo), float(hi)
for name in ["mu_alpha", "mu_beta", "sigma_alpha", "sigma_beta", "sigma_obs"]:
m, lo, hi = param_summary(samples[name])
print(f"{name}: mean={m:.3f}, HPDI=[{lo:.3f}, {hi:.3f}]")
predictive = Predictive(hierarchical_regression_model, samples, return_sites=["y"])
ppc = predictive(random.PRNGKey(2), x=x, group_idx=groups, n_groups=n_groups)
y_rep = np.asarray(ppc["y"])
group_to_plot = 0
mask = df["group"].values == group_to_plot
x_g = df.loc[mask, "x"].values
y_g = df.loc[mask, "y"].values
y_rep_g = y_rep[:, mask]
order = np.argsort(x_g)
x_sorted = x_g[order]
y_rep_sorted = y_rep_g[:, order]
y_med = np.median(y_rep_sorted, axis=0)
y_lo, y_hi = np.percentile(y_rep_sorted, [5, 95], axis=0)
plt.figure(figsize=(8, 5))
plt.scatter(x_g, y_g)
plt.plot(x_sorted, y_med)
plt.fill_between(x_sorted, y_lo, y_hi, alpha=0.3)
plt.show()
alpha_g = np.asarray(samples["alpha_g"]).mean(axis=0)
beta_g = np.asarray(samples["beta_g"]).mean(axis=0)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].bar(range(n_groups), alpha_g)
axes[0].axhline(truth["true_alpha"], linestyle="--")
axes[1].bar(range(n_groups), beta_g)
axes[1].axhline(truth["true_beta"], linestyle="--")
plt.tight_layout()
plt.show()
Practical Applications
- Use Case: E-commerce demand forecasting with group-specific trends (e.g., regional sales patterns)
- Pitfall: Overfitting group-level parameters without sufficient data, leading to poor generalization
References:
Continue reading
Next article
Rebuilding Azure DevOps CI/CD for Compliance
Related Content
Build and Train Advanced Architectures with Residual Connections, Self-Attention, and Adaptive Optimization Using JAX, Flax, and Optax
A JAX-based tutorial implements self-attention and residual blocks, achieving 92% accuracy on synthetic data with adaptive optimization.
A Coding Guide to Implement Advanced Hyperparameter Optimization with Optuna
Implement Advanced Hyperparameter Optimization with Optuna using Pruning, Multi-Objective Search, Early Stopping, and Deep Visual Analysis.
Vectors, Dimensions, and Feature Spaces: The Geometric Foundation of Machine Learning
An engineering guide to representing real-world objects as vectors in high-dimensional feature spaces using PHP for normalization and linear modeling.