Skip to main content

On This Page

Liquid AI LFM2.5-350M: High-Density Edge Intelligence via 28T Token Training

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Liquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement Learning

Liquid AI has launched LFM2.5-350M, a compact model that challenges traditional scaling laws through extreme intelligence density. This architecture was pre-trained on a massive 28 trillion tokens, achieving an unprecedented 80,000:1 token-to-parameter ratio.

Why This Matters

While frontier models focus on increasing parameter counts to achieve intelligence, LFM2.5-350M addresses the “memory wall” bottleneck by optimizing for edge devices with limited compute. By utilizing a hybrid backbone of Linear Input-Varying Systems (LIVs) and Grouped Query Attention (GQA), it provides a 32k context window while maintaining a memory footprint as low as 81MB on mobile GPUs, proving that parameter count is not the sole determinant of performance.

Key Insights

  • Hybrid LIV/GQA Architecture: The model uses 10 Double-Gated LIV Convolution Blocks for sequence processing and 6 GQA blocks for high-precision retrieval, reducing KV cache overhead (2026).
  • Extreme Intelligence Density: Training on 28T tokens allows this 350M parameter model to outperform competitors twice its size on benchmarks like IFEval, where it scored 76.96 (2026).
  • High-Speed Inference: On a single NVIDIA H100, the architecture supports throughput of 40.4K output tokens per second, making it ideal for real-time agentic tasks (2026).
  • Edge-Specific Optimization: Low-memory inference is achieved via RunAnywhere Q4, requiring only 169MB on Snapdragon 8 Elite NPUs and 81MB on Snapdragon GPUs (2026).
  • Instruction Following Specialist: With a GPQA Diamond score of 30.64 and high IFEval results, the model is tuned for tool use and structured data extraction (2026).

Practical Applications

  • Use Case: High-volume data extraction and real-time classification on Raspberry Pi 5 using Cactus Engine int8 with a 300MB memory footprint. Pitfall: Attempting complex mathematics or creative writing, which the model documentation explicitly advises against.
  • Use Case: Local agentic tasks and function calling on Snapdragon 8 Elite mobile devices using RunAnywhere Q4 for low-latency tool use. Pitfall: Utilizing the model for complex coding tasks where larger reasoning models remain necessary.

References:

Continue reading

Next article

32 Tickets, 7 Stories, 1 Video: How the Building Agent Fixed 13 Critical Infrastructure Bugs in Sprint 11

Related Content