Identifying Influential LLM Interactions at Scale with SPEX and ProxySPEX

Identifying Interactions at Scale for LLMs

Berkeley researchers developed SPEX to isolate critical interactions in LLMs using signal processing. The algorithm scales interpretability from dozens to thousands of components by exploiting model sparsity.

Why This Matters

Model behavior emerges from complex dependencies rather than isolated components. To achieve state-of-the-art performance, models synthesize feature relationships and process information through interconnected internal structures. Exhaustive analysis of these interactions is usually computationally infeasible due to exponential growth. SPEX addresses this by reframing interaction discovery as a sparse recovery problem, allowing for grounded interpretability at scale.

Key Insights

SPEX leverages sparsity and low-degreeness to transform interaction discovery into a solvable sparse recovery problem (ICML, 2025).
ProxySPEX matches SPEX performance with 10x fewer ablations by exploiting hierarchical structural properties (NeurIPS, 2025).
Feature attribution via SPEX identified high-order synergies between keywords that caused a 92% failure rate in GPT-4o mini on modified trolley problems.
Data attribution in ResNet models differentiates between synergistic interactions defining decision boundaries and redundant interactions reinforcing concepts on CIFAR-10.
Mechanistic analysis reveals that early transformer layers function linearly, while later layers exhibit significant intra-layer attention head interactions.

Practical Applications

Task-specific attention head pruning: Using ProxySPEX-informed strategies to improve model performance on MMLU history tasks by removing non-essential interacting components.
Data selection for training: Identifying redundant semantic duplicates versus synergistic decision-boundary defining samples in image classification datasets.
Failure mode analysis: Identifying high-order synergies in prompt keywords to debug model reasoning errors that standard marginal attribution methods miss.

References:

On This Page

Identifying Interactions at Scale for LLMs

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Predictive Analytics and Auto-Remediation in AIOps: Transforming DevOps with Machine Learning

Extracting Emergent Structural Knowledge from LLMs through Sideways Questioning

Understanding Neural Network Architecture: From Pixels to Feature Detection