Skip to main content

On This Page

Teaching LLMs to Count: IBM's PD-SSM Breakthrough

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

The quest to teach LLMs how to count

IBM researchers presented a breakthrough at NeurIPS 2025, introducing PD-SSM, a state-space model that achieves 98.5% accuracy on sequential reasoning tasks. This addresses a critical limitation in transformers’ ability to track state over long sequences.

Why This Matters

Transformers excel at parallel processing but struggle with state tracking, a sequential task essential for logical reasoning. This flaw manifests in simple tasks like counting “r”s in “strawberry” or evaluating parity (even/odd counts of 1s in binary strings). While workarounds like chain-of-thought prompting exist, they increase computational cost. IBM’s PD-SSM directly tackles this by enhancing state tracking in hybrid transformer-SSM models, enabling progress on complex tasks like code generation and time-series forecasting.

Key Insights

  • “PD-SSM achieves 98.5% accuracy on state tracking tasks, outperforming other SSM variants by 15 percentage points (IBM, 2025)”
  • “State tracking is critical for logical reasoning, as transformers struggle with parity problems (Hahn, 2020)”
  • “IBM’s Granite models integrate PD-SSM to improve efficiency in code generation and long-sequence analysis”

Practical Applications

  • Use Case: IBM’s Granite models use PD-SSM for code generation and ethanol demand forecasting.
  • Pitfall: Relying on diagonal matrices in SSMs leads to poor state tracking, resulting in failure on parity checks.

References:


Continue reading

Next article

IBM’s Software Engineering Agent Tops Leaderboard for Java

Related Content