Large Language Model

54 articles in this category (Page 2 of 3)

AI NewsLarge Language ModelMachine Learning

Optimizing LLM Throughput: How Paged Attention Achieves 98.5% Memory Utilization

Paged Attention solves the KV cache memory bottleneck, boosting GPU utilization from 24% to 98.5% through on-demand allocation and Copy-on-Write prefix sharing.

Mar 24, 2026

AI NewsAgentic AILarge Language Model

Luma Labs Uni-1: Bridging the Intent Gap with Autoregressive Reasoning Transformers

Luma Labs Uni-1 utilizes a decoder-only autoregressive transformer to reason through spatial logic before generation, outperforming Flux Max on RISEBench at $0.10 per image.

Mar 23, 2026

AI NewsArtificial IntelligenceLarge Language Model

Building Uncertainty-Aware LLM Systems with Confidence Estimation and Automated Web Research

A technical implementation of a three-stage LLM pipeline using Python to enable self-reported confidence scores, meta-cognitive self-evaluation, and automated web research for higher reliability.

Mar 21, 2026

AI NewsAgentic AILarge Language Model

NVIDIA Nemotron-Cascade 2: High-Density 30B MoE with Gold Medal Reasoning

NVIDIA’s Nemotron-Cascade 2 is a 30B MoE model with 3B active parameters achieving Gold Medal-level results in IMO and IOI reasoning benchmarks.

Mar 20, 2026

AI NewsAgentic AILarge Language Model

ServiceNow Research Launches EnterpriseOps-Gym to Benchmark LLM Agentic Planning

ServiceNow Research's EnterpriseOps-Gym reveals that even top LLMs like Claude Opus 4.5 fail to exceed a 37.4% success rate in enterprise planning tasks.

Mar 18, 2026

AI NewsArtificial IntelligenceLarge Language Model

Building Type-Safe and Schema-Constrained LLM Pipelines with Outlines and Pydantic

Build production-grade LLM pipelines using Outlines and Pydantic to enforce schema validation and JSON recovery for reliable structured outputs.

Mar 14, 2026

AI NewsAgentic AILarge Language Model

NVIDIA Nemotron 3 Super: 120B Parameter Hybrid MoE Model for Agentic AI

NVIDIA's Nemotron 3 Super is a 120B parameter hybrid Mamba-Attention MoE model delivering 5x higher throughput for complex agentic AI applications.

Mar 11, 2026

AI NewsAgentic AILarge Language Model

Liquid AI Launches LocalCowork: Privacy-First Agent Workflows with LFM2-24B-A2B

Liquid AI releases LocalCowork and LFM2-24B-A2B, enabling local agentic workflows with 385ms tool-selection latency and a 14.5 GB memory footprint on consumer hardware.

Mar 5, 2026

AI NewsLarge Language ModelMachine Learning

Yuan 3.0 Ultra: Optimizing Trillion-Parameter MoE Efficiency via LAEP

YuanLab AI releases Yuan 3.0 Ultra, a 1T-parameter MoE model that achieves a 49% boost in pre-training efficiency. By utilizing Layer-Adaptive Expert Pruning and a Reflection Inhibition Reward Mechanism, it reduces total parameters by 33.3% while maintaining state-of-the-art performance in multimodal retrieval and enterprise benchmarks.

Mar 4, 2026

AI NewsLarge Language ModelArtificial Intelligence

How to Build a Stable and Efficient QLoRA Fine-Tuning Pipeline Using Unsloth for LLMs

Learn to build a stable QLoRA pipeline using Unsloth to fine-tune 1.5B parameter models with 4-bit quantization on limited GPU resources efficiently.

Mar 3, 2026

AI NewsLarge Language ModelTechnology

Google AI Introduces STATIC: 948x Faster Constrained Decoding for LLM Generative Retrieval

Google DeepMind's STATIC framework delivers 948x faster constrained decoding for LLM retrieval, enabling 100% business logic compliance on TPUs.

Mar 1, 2026

AI NewsLarge Language ModelMachine Learning

Sakana AI Launches Doc-to-LoRA and Text-to-LoRA for Instant LLM Adaptation

Sakana AI introduces hypernetworks that reduce 128K-token document VRAM usage from 12GB to under 50MB via instant LoRA generation.

Feb 27, 2026

AI NewsAgentic AILarge Language Model

Building Hierarchical AI Agents with Qwen2.5 and Python Tool Execution

Implement a multi-agent system using Qwen2.5-1.5B-Instruct to decompose tasks into 3-8 actionable steps with integrated Python tool execution.

Feb 27, 2026

AI NewsLarge Language ModelMachine Learning

Perplexity Releases pplx-embed: Qwen3-Based Bidirectional Models for Web-Scale RAG

Perplexity launches pplx-embed, a Qwen3-based embedding suite featuring bidirectional attention and native INT8 support for high-throughput retrieval tasks.

Feb 26, 2026

AI NewsLarge Language ModelMachine Learning

ByteDance AI Maps Molecular Bonds in Reasoning to Stabilize Long Chain-of-Thought Models

ByteDance researchers introduce MOLE-SYN, a framework that treats AI reasoning as molecular structures, stabilizing Long CoT performance across benchmarks like GSM8K and MATH-500.

Feb 22, 2026

AI NewsLarge Language ModelTechnology

Instrumenting and Evaluating LLM Applications with TruLens and OpenAI

Build transparent RAG pipelines using TruLens to instrument traces and quantitatively evaluate LLM behavior across relevance and groundedness metrics.

Feb 22, 2026

AI NewsAgentic AILarge Language Model

Gemini 3.1 Pro: 1M Token Context and 77.1% ARC-AGI-2 Reasoning for AI Agents

Google releases Gemini 3.1 Pro with a 1M token context window and 77.1% ARC-AGI-2 reasoning score, targeting the high-performance autonomous AI agent market. This release focuses on reasoning stability, software engineering, and tool-use reliability for developers building next-generation autonomous agents and complex technical workflows.

Feb 19, 2026

AI NewsLarge Language ModelOpen Source

MBZUAI Releases K2 Think V2: A Fully Sovereign 70B Reasoning Model For Math, Code, And Science

MBZUAI launched K2 Think V2, a fully sovereign 70 billion parameter reasoning model achieving a 90.42 pass rate on the AIME 2025 benchmark.

Jan 28, 2026

AI NewsLarge Language ModelAgentic AI

Alibaba Unveils Qwen3-Max-Thinking, a Trillion-Parameter Reasoning Model

Alibaba introduces Qwen3-Max-Thinking, a test-time scaled reasoning model with native tool use, achieving 92.8% accuracy on GPQA Diamond and 91.4% on LiveCodeBench v6.

Jan 28, 2026

AI NewsAI AgentsLarge Language Model

Moonshot AI Releases Kimi K2.5: An Open Source Visual Agentic Intelligence Model with Native Swarm Execution

Moonshot AI launched Kimi K2.5, an open-source visual agentic intelligence model boasting a 1T parameter scale and achieving state-of-the-art results in agentic benchmarks.

Jan 27, 2026

AI NewsLarge Language ModelAI Agents

NVIDIA Releases Nemotron 3: A Hybrid Mamba Transformer MoE Stack for Long Context Agentic AI

NVIDIA released the Nemotron 3 family of open models, with the Nano variant achieving 4x higher token throughput than Nemotron 2 Nano.

Dec 20, 2025

AI NewsLanguage ModelLarge Language Model

NVIDIA and Mistral AI Bring 10x Faster Inference for the Mistral 3 Family on GB200 NVL72 GPU Systems

NVIDIA and Mistral AI achieve 10x faster inference for Mistral 3 models on GB200 NVL72 GPUs, reaching 5M tokens per second per MW.

Dec 2, 2025

AI NewsAgentic AILarge Language Model

DeepSeek Introduces DeepSeek-V3.2 and DeepSeek-V3.2-Speciale for Long-Context Reasoning and Agentic Workloads

DeepSeek’s new models cut long-context inference costs by 50% while matching GPT-5 and Gemini 3.0 Pro reasoning benchmarks.

Dec 1, 2025

AI NewsArtificial IntelligenceLarge Language Model

DeepSeek AI Releases DeepSeekMath-V2: The Open Weights Maths Model That Scored 118/120 on Putnam 2024

DeepSeekMath-V2, a 685B-parameter open-weight model, scored 118/120 on Putnam 2024 with self-verifying theorem proving.

Nov 28, 2025