NVIDIA’s Extreme Co-Design: From GPU Hardware to Fully Open Nemotron LLMs

Even the chip makers are making LLMs

NVIDIA VP Kari Briski explains why the company has transitioned into a full-stack entity by developing the Nemotron family of models. Since 2018, NVIDIA has utilized a rapid hardware-software feedback loop to drive GPU architecture through difficult LLM workloads.

Why This Matters

The gap between theoretical AI models and hardware efficiency often leads to significant performance bottlenecks. By employing ‘extreme co-design,’ NVIDIA integrates model requirements into the hardware planning process—such as the Blackwell NVFP4 precision—to ensure that memory hierarchies and networking stacks are purpose-built for agentic systems. This approach moves beyond general-purpose computing toward a paradigm where software libraries and hardware SKUs are synchronized to handle million-token context lengths and disaggregated serving.

Key Insights

NVIDIA Blackwell supports NVFP4 precision, enabling models to retain full accuracy while reducing memory footprints compared to post-training quantization.
The Nemotron family includes Nano, Super, and Ultra models, with Nano V3 released in late 2025 and Ultra scheduled for April 2026.
The hybrid Mamba State Space model architecture combined with Transformers improves token efficiency by avoiding the quadratic inference time growth of dense models.
NVIDIA’s Dynamo framework enables disaggregated serving, allowing prefill and decode tasks to run on different GPU SKUs for maximum efficiency.
The $180,000 AI robotics competition launched by Intrinsic and NVIDIA targets dexterous cable management using open-source AI tools.

Practical Applications

Domain Specialization: ServiceNow utilized NVIDIA’s open data to create the Apriel model and custom ‘gym’ environments for task-specific verification.
Agentic Memory Management: Using context memory engines to store and recall million-token context lengths for complex coding and documentation tasks.
Cybersecurity: Partners leverage open-source weights to build specialized verifiers that identify false positives in threat detection systems.

References:

https://stackoverflow.blog/2026/03/10/even-the-chip-makers-are-making-llms/
intrinsic.ai/stack

On This Page

Even the chip makers are making LLMs

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

AI Hardware Stack Rebuilt from Wafer Up: Cerebras WSE-3 Beats B200 by 21x, OpenAI Bets $20B+

NVIDIA and University of Maryland Release Audio Flamingo Next (AF-Next)

AMD’s Silicon Strategy: Balancing Heterogeneous Compute and AI Innovation