llm

23 articles in this category

PythonLLMAI

Codexity Part 6: Small Model Inference with llama-cpp-python

Run a quantized 7B model locally to generate cited answers from scraped web content. Choose between Qwen, Mistral, Phi, and Llama models. Build prompts that make small models behave like large ones.

Sep 15, 2026

PythonAILLM

Codexity Part 2: Query Rewriting with LLMs

A user types a vague question. The query rewriter transforms it into targeted search queries using a local LLM. We cover intent classification, query decomposition, and prompt engineering that actually works with small models.

Jun 23, 2026

AI NewsLLMSoftware Engineering

Understanding LLM API Architecture: Request Patterns, Tokenization, and Cost Optimization

Learn how LLM APIs function under the hood, where output tokens can cost 3–5× more than input tokens.

May 26, 2026

AI NewsLLMObservability

Beyond the Green Dot: Advanced LLM Observability Lessons from OpenAI Outages

OpenAI's status page lagged 90 minutes during the April 2026 outage; instrumenting five key signals like TTFT and token throughput is essential for reliable AI infrastructure.

Apr 26, 2026

AI NewsLLMObservability

Essential Observability: 3 Critical Alerts for LLM Systems

Prevent runaway LLM costs and quality drift using OpenTelemetry GenAI conventions to monitor per-trace spend and retrieval relevance.

Apr 26, 2026

aiagentspython

AI Agents from Scratch Part 6: Complete Agent & Best Practices (Research Report Generator)

The finale! Run your complete Research Report Generator, learn best practices, explore advanced memory strategies, and discover how to extend your agent with new capabilities.

Mar 29, 2026

AI AgentsLLMArchitecture

How I Built an AI System That Writes Full-Length Books

A multi-agent pipeline that autonomously generates complete books—from technical manuals to fantasy novels—with built-in research, quality control, and hallucination prevention.

Mar 28, 2026

aiagentspython

AI Agents from Scratch Part 5: The Agent Core & Loop (Research Report Generator)

Build the brain of your AI agent! Implement the ReAct loop, system prompts, tool execution, and phase handlers that orchestrate the entire research workflow.

Mar 17, 2026

aiagentspython

AI Agents from Scratch Part 4: Human-in-the-Loop Validation (Research Report Generator)

Keep humans in control of AI agents. Build checkpoints for plan approval, source selection, fact verification, and draft review—so agents stay helpful without going rogue.

Mar 8, 2026

aiagentspython

AI Agents from Scratch Part 3: State Management & Memory (Research Report Generator)

Give your AI agent a memory! Learn short-term vs long-term memory, prevent context overflow, and enable agents to resume interrupted work.

Feb 28, 2026

AI NewsAILLM

LLM Grounding: Connecting Language Models to Reality

Grounding reduces hallucinations by 42-68%

Feb 20, 2026

aiagentspython

AI Agents from Scratch Part 2: Building the Tool System (Research Report Generator)

Give your AI agent superpowers! Build a clean tool system with web search, content extraction, and file operations—the foundation that lets agents interact with the real world.

Feb 15, 2026

aiagentspython

AI Agents from Scratch Part 1: Understanding the ReAct Pattern (Research Report Generator)

Start your journey building AI agents without frameworks. Learn the foundational ReAct pattern that powers modern agents—with a hands-on Research Report Generator example.

Jan 28, 2026

AI NewsSpring AILLM

Overview of MCP Annotations in Spring AI

Explore the Spring AI MCP annotations to significantly lower the barrier to entry for building agentic AI systems.

Jan 28, 2026

AI NewsLLMSoftware Engineering

Taming LLM Output Chaos: A 3-Tier Normalisation Pattern

A 3-tier normalisation pattern achieves 100% collision detection in LLM-powered knowledge graph construction by addressing inconsistent outputs.

Jan 25, 2026

AI NewsMobileLLM

Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy

Cactus v1 delivers sub-50ms time-to-first-token for on-device LLM inference, enabling mobile AI without network dependence or privacy concerns.

Dec 24, 2025

AI NewsLLMCLI

Toad: A Unified CLI for LLM Agents with Enhanced UX

Toad, a new CLI tool by Will McGugan, unifies access to 12+ LLM agents via the Agent Communication Protocol (ACP), aiming to improve the user experience of AI-assisted coding.

Dec 22, 2025

AI NewsJavaLLM

Jlama: Running LLMs Locally in Java

Jlama 0.8.4 enables local LLM inference in Java, eliminating reliance on external APIs and offering greater control.

Dec 21, 2025

AI NewsPythonLLM

Building Your First MCP Server in Python

This guide details building a complete MCP server in Python, demonstrating tools, resources, and prompts for LLM integration.

Dec 16, 2025

PythonAILangChain

LangChain Complete Guide: Building Production-Ready LLM Applications

Master LangChain for building production LLM applications. Learn chains, agents, memory systems, RAG, vector stores, evaluation, and deployment strategies with practical Python examples.

Nov 1, 2025

AI NewsObservabilityLLM

Why Observability Matters for AI Applications: A Deep Dive into LLM Monitoring

Sally O'Malley explains the unique observability challenges of Large Language Models (LLMs) and demonstrates how to implement an open-source observability stack using vLLM, Llama Stack, Prometheus, Grafana, and OpenTelemetry. She discusses key metrics for monitoring performance, cost, and quality, and the importance of tracing for debugging AI workloads.

Oct 20, 2025

AILLMGithub

Stock Weather AI

A compact AI toolkit that collects market data and news, runs lightweight evaluations, and produces per-ticker weather-style reports for stock analysis experiments.

Oct 4, 2025

AI NewsLLMFine-tuning

20x Faster TRL Fine-tuning with RapidFire AI

Hugging Face TRL integrates with RapidFire AI, delivering 16–24x faster experimentation throughput for LLM fine-tuning.

Jun 3, 2025