LLMs

17 articles in this category

AI NewsReinforcement LearningLLMs

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

LinkedIn successfully enabled agentic reinforcement learning training for the GPT-OSS-20B model, achieving comparable performance to OpenAI’s o3-mini and o4-mini.

Jan 27, 2026

AI NewsLLMsSoftware Agents

Unrolling the Codex agent loop

A technical deep dive into the Codex agent loop, explaining how Codex CLI orchestrates models, tools, prompts, and performance, achieving efficient agent behavior.

Jan 23, 2026

AI NewsGame DevelopmentLLMs

PiGym – LLM-Generated Pi Digit Memorization Game

PiGym demonstrates the capability of Claude Opus 4.5 to independently develop a functional game from natural language descriptions.

Jan 10, 2026

AI NewsLLMsCost Optimization

The $10K/Month Mistake: Stop Bleeding Money on Your AI Agents

AI agents built with Claude can quickly become expensive; optimizing system prompts and utilizing Skills can reduce costs by over 60%.

Dec 30, 2025

AI NewsLLMsAdoption

OpenAI Surpasses One Million Customers, Enabling Novel Task Completion

OpenAI has reached over one million customers globally, with 75% reporting the ability to complete tasks previously impossible.

Dec 22, 2025

AI NewsTransformer ModelsLLMs

Adapting Rotary Position Embeddings (RoPE) for Long Context Lengths

Llama 3 achieves 131K token context length by scaling RoPE frequencies, improving long-range stability without sacrificing local positional information.

Dec 20, 2025

AI NewsLLMsAgentic AI

Nemotron 3 Nano - A new Standard for Efficient, Open, and Intelligent Agentic Models

NVIDIA’s Nemotron 3 Nano 30B A3B model achieves up to 3.3x higher throughput than leading models while maintaining best-in-class reasoning accuracy.

Dec 15, 2025

AI Newsllama.cppLLMs

New llama.cpp Server Feature: Dynamic Model Management

llama.cpp server introduces router mode, enabling dynamic loading and switching between multiple models without restarts.

Dec 11, 2025

AI NewsLLMsCustomer Service

Salesforce's eVerse Simulates Realistic Customer Service Interactions

Salesforce’s eVerse simulation tool aims to improve AI agent performance in noisy, unpredictable call centers, achieving 84-88% coverage of routine inquiries.

Dec 11, 2025

AI NewsLLMsEvaluation

FACTS Benchmark Suite: A New Evaluation for LLM Factuality

The FACTS Benchmark Suite provides a systematic evaluation of LLM factuality across reasoning types, revealing all evaluated models achieved under 70% accuracy.

Dec 9, 2025

AI NewsPrivacyLLMs

Privacy in Action: Realistic mitigation and evaluation for agentic LLMs

New research from Microsoft demonstrates two approaches to reducing privacy leaks in AI agents, achieving up to a 25% reduction in information leakage while preserving task completion.

Nov 25, 2025

AI NewsMachine LearningLLMs

Salesforce AI Research Introduces xRouter: A Reinforcement Learning Router for Cost Aware LLM Orchestration

Salesforce’s xRouter achieves near GPT-5 accuracy on Olympiad Bench while reducing GPT-5 evaluation cost by 87.5%.

Nov 25, 2025

AI NewsApple DevelopmentLLMs

Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms

AnyLanguageModel simplifies LLM integration for Apple developers, offering a single API to seamlessly switch between local and remote models.

Nov 20, 2025

AI NewsLLMsInference

Continuous batching from first principles

Continuous batching maximizes LLM throughput by intelligently combining prefill and decode phases, achieving up to a 2x speedup in token generation.

Sep 11, 2025

AI NewsLLMsAI Architecture

Teaching LLMs to Count: IBM's PD-SSM Breakthrough

IBM's PD-SSM model achieves 98.5% accuracy on state tracking tasks, addressing LLM limitations in sequential reasoning.

Feb 9, 2021

AI NewsTransparencyLLMs

IBM Granite is Ranked World’s Most Transparent Model

IBM Granite achieved a 95% score on the Stanford Foundation Model Transparency Index, surpassing all other models by 23 percentage points.

Feb 9, 2021

AI NewsLLMsAI Evaluation

IBM and Notre Dame Open-Source Benchmark Cards for LLMs

IBM and University of Notre Dame released 105 validated benchmark cards and a dataset of 4,000 cards to improve LLM evaluation transparency.

Feb 9, 2021