Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective
LinkedIn successfully enabled agentic reinforcement learning training for the GPT-OSS-20B model, achieving comparable performance to OpenAI’s o3-mini and o4-mini.
Read more
AI NewsLLMsSoftware Agents
Unrolling the Codex agent loop
A technical deep dive into the Codex agent loop, explaining how Codex CLI orchestrates models, tools, prompts, and performance, achieving efficient agent behavior.
Read more
AI NewsGame DevelopmentLLMs
PiGym – LLM-Generated Pi Digit Memorization Game
PiGym demonstrates the capability of Claude Opus 4.5 to independently develop a functional game from natural language descriptions.
Read more
AI NewsLLMsCost Optimization
The $10K/Month Mistake: Stop Bleeding Money on Your AI Agents
AI agents built with Claude can quickly become expensive; optimizing system prompts and utilizing Skills can reduce costs by over 60%.
Read more
AI NewsLLMsAdoption
OpenAI Surpasses One Million Customers, Enabling Novel Task Completion
OpenAI has reached over one million customers globally, with 75% reporting the ability to complete tasks previously impossible.
Read more
AI NewsTransformer ModelsLLMs
Adapting Rotary Position Embeddings (RoPE) for Long Context Lengths
Llama 3 achieves 131K token context length by scaling RoPE frequencies, improving long-range stability without sacrificing local positional information.
Read more
AI NewsLLMsAgentic AI
Nemotron 3 Nano - A new Standard for Efficient, Open, and Intelligent Agentic Models
NVIDIA’s Nemotron 3 Nano 30B A3B model achieves up to 3.3x higher throughput than leading models while maintaining best-in-class reasoning accuracy.
Read more
AI Newsllama.cppLLMs
New llama.cpp Server Feature: Dynamic Model Management
llama.cpp server introduces router mode, enabling dynamic loading and switching between multiple models without restarts.
Read more
AI NewsLLMsCustomer Service
Salesforce's eVerse Simulates Realistic Customer Service Interactions
Salesforce’s eVerse simulation tool aims to improve AI agent performance in noisy, unpredictable call centers, achieving 84-88% coverage of routine inquiries.
Read more
AI NewsLLMsEvaluation
FACTS Benchmark Suite: A New Evaluation for LLM Factuality
The FACTS Benchmark Suite provides a systematic evaluation of LLM factuality across reasoning types, revealing all evaluated models achieved under 70% accuracy.
Read more
AI NewsPrivacyLLMs
Privacy in Action: Realistic mitigation and evaluation for agentic LLMs
New research from Microsoft demonstrates two approaches to reducing privacy leaks in AI agents, achieving up to a 25% reduction in information leakage while preserving task completion.
Read more
AI NewsMachine LearningLLMs
Salesforce AI Research Introduces xRouter: A Reinforcement Learning Router for Cost Aware LLM Orchestration
Salesforce’s xRouter achieves near GPT-5 accuracy on Olympiad Bench while reducing GPT-5 evaluation cost by 87.5%.
Read more
AI NewsApple DevelopmentLLMs
Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms
AnyLanguageModel simplifies LLM integration for Apple developers, offering a single API to seamlessly switch between local and remote models.
Read more
AI NewsLLMsInference
Continuous batching from first principles
Continuous batching maximizes LLM throughput by intelligently combining prefill and decode phases, achieving up to a 2x speedup in token generation.
Read more
AI NewsLLMsAI Architecture
Teaching LLMs to Count: IBM's PD-SSM Breakthrough
IBM's PD-SSM model achieves 98.5% accuracy on state tracking tasks, addressing LLM limitations in sequential reasoning.
Read more
AI NewsTransparencyLLMs
IBM Granite is Ranked World’s Most Transparent Model
IBM Granite achieved a 95% score on the Stanford Foundation Model Transparency Index, surpassing all other models by 23 percentage points.
Read more
AI NewsLLMsAI Evaluation
IBM and Notre Dame Open-Source Benchmark Cards for LLMs
IBM and University of Notre Dame released 105 validated benchmark cards and a dataset of 4,000 cards to improve LLM evaluation transparency.