AI Inference

2 articles in this category

AI NewsAI InferenceSoftware Engineering

DFlash achieves 6x lossless acceleration by replacing sequential drafting with parallel block diffusion, as reported by Z Lab in 2026.

Apr 7, 2026

AI NewsStorageAI Inference

IBM Storage Scale reduces time-to-first-token (TTFT) by 8-12x for LLM inference by providing a high-performance KV cache tier.

Feb 9, 2021