Attention Mechanisms

1 article in this category

AI NewsTransformersAttention Mechanisms

Differential Transformer V2: Faster Decoding and Improved Stability

Microsoft's Differential Transformer V2 achieves comparable decoding speeds to standard Transformers while reducing language modeling loss by 0.02-0.03 at 1T tokens.

Jan 20, 2026