Skip to main content
← All Tags

AI Inference

2 articles in this category

AI NewsAI InferenceSoftware Engineering

DFlash: Moving the Ceiling for Speculative Decoding Speed

DFlash achieves 6x lossless acceleration by replacing sequential drafting with parallel block diffusion, as reported by Z Lab in 2026.

Read more
AI NewsStorageAI Inference

Accelerating AI inference with IBM Storage Scale

IBM Storage Scale reduces time-to-first-token (TTFT) by 8-12x for LLM inference by providing a high-performance KV cache tier.

Read more