Zyphra ZAYA1-8B: A 760M Parameter MoE Model Outperforming Claude 4.5 on Math

Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

Zyphra AI has launched ZAYA1-8B, a Mixture of Experts model featuring only 760 million active parameters. The model was trained end-to-end on a cluster of 1,024 AMD Instinct MI300x nodes. It achieves an 89.6 score on HMMT’25, surpassing the mathematical reasoning performance of Claude 4.5 Sonnet.

Why This Matters

Standard dense models activate every parameter for every token, leading to high inference costs and latency as model size scales. ZAYA1-8B utilizes a Mixture of Experts (MoE) architecture to decouple representational capacity from compute cost, addressing the inefficiency of massive parameter activation. By optimizing for intelligence density, it demonstrates that specialized architectures can match frontier performance while drastically reducing memory bandwidth requirements and inference FLOPs.

Key Insights

Compressed Convolutional Attention (CCA) achieves 8× KV-cache compression compared to standard attention mechanisms (Zyphra, 2026).
The ZAYA1 MLP-based router utilizes PID-controller bias balancing to prevent expert load imbalance during training (Zyphra, 2026).
Markovian RSA test-time compute combines recursive self-aggregation with fixed-duration reasoning chunks to keep context windows bounded (Zyphra, 2026).
Training was performed on 1,024 AMD Instinct MI300x nodes using the AMD Pensando Pollara interconnect (IBM/Zyphra, 2026).
A five-stage post-training pipeline utilizes an RLVE-Gym phase with dynamically adjusted puzzle difficulty to train reasoning circuits (Zyphra, 2026).

Practical Applications

Use case: On-device deployment for local LLM applications requiring high intelligence density and low memory bandwidth.
Pitfall: Applying the Markovian RSA harness to models like Qwen3-4B without reasoning-specific co-design results in diminished performance uplift.
Use case: Serverless inference for mathematical and coding tasks via Zyphra Cloud using the Apache 2.0 licensed weights.
Pitfall: Neglecting active load balancing in MoE routers leads to unstable training and underutilization of the expert network.

References:

https://www.marktechpost.com/2026/05/06/zyphra-releases-zaya1-8b-a-reasoning-moe-trained-on-amd-hardware-that-punches-far-above-its-weight-class/

On This Page

Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

TII Abu-Dhabi Released Falcon H1R-7B: A New Reasoning Model Outperforming Others in Math and Coding

Alibaba Qwen 3.5 Medium Series: High-Efficiency MoE Models with 1M Context

Parcae: A Stable Looped Transformer Architecture for Scalable Quality