Skip to main content
← All Tags

Transformers

2 articles in this category

AI NewsTransformersAttention Mechanisms

Differential Transformer V2: Faster Decoding and Improved Stability

Microsoft's Differential Transformer V2 achieves comparable decoding speeds to standard Transformers while reducing language modeling loss by 0.02-0.03 at 1T tokens.

Read more
AI NewsNLPTransformers

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

Transformers v5 redesigns tokenization, separating tokenizer architecture from trained vocabulary for increased customization and a 20% reduction in code duplication across models.

Read more