Skip to main content
← All Tags

llm

23 articles in this category

PythonLLMAI

Codexity Part 6: Small Model Inference with llama-cpp-python

Run a quantized 7B model locally to generate cited answers from scraped web content. Choose between Qwen, Mistral, Phi, and Llama models. Build prompts that make small models behave like large ones.

Read more
PythonAILLM

Codexity Part 2: Query Rewriting with LLMs

A user types a vague question. The query rewriter transforms it into targeted search queries using a local LLM. We cover intent classification, query decomposition, and prompt engineering that actually works with small models.

Read more
AI NewsLLMSoftware Engineering

Understanding LLM API Architecture: Request Patterns, Tokenization, and Cost Optimization

Learn how LLM APIs function under the hood, where output tokens can cost 3–5× more than input tokens.

Read more
AI NewsLLMObservability

Beyond the Green Dot: Advanced LLM Observability Lessons from OpenAI Outages

OpenAI's status page lagged 90 minutes during the April 2026 outage; instrumenting five key signals like TTFT and token throughput is essential for reliable AI infrastructure.

Read more
AI NewsLLMObservability

Essential Observability: 3 Critical Alerts for LLM Systems

Prevent runaway LLM costs and quality drift using OpenTelemetry GenAI conventions to monitor per-trace spend and retrieval relevance.

Read more
aiagentspython

AI Agents from Scratch Part 6: Complete Agent & Best Practices (Research Report Generator)

The finale! Run your complete Research Report Generator, learn best practices, explore advanced memory strategies, and discover how to extend your agent with new capabilities.

Read more
AI AgentsLLMArchitecture

How I Built an AI System That Writes Full-Length Books

A multi-agent pipeline that autonomously generates complete books—from technical manuals to fantasy novels—with built-in research, quality control, and hallucination prevention.

Read more
aiagentspython

AI Agents from Scratch Part 5: The Agent Core & Loop (Research Report Generator)

Build the brain of your AI agent! Implement the ReAct loop, system prompts, tool execution, and phase handlers that orchestrate the entire research workflow.

Read more
aiagentspython

AI Agents from Scratch Part 4: Human-in-the-Loop Validation (Research Report Generator)

Keep humans in control of AI agents. Build checkpoints for plan approval, source selection, fact verification, and draft review—so agents stay helpful without going rogue.

Read more
aiagentspython

AI Agents from Scratch Part 3: State Management & Memory (Research Report Generator)

Give your AI agent a memory! Learn short-term vs long-term memory, prevent context overflow, and enable agents to resume interrupted work.

Read more
AI NewsAILLM

LLM Grounding: Connecting Language Models to Reality

Grounding reduces hallucinations by 42-68%

Read more
aiagentspython

AI Agents from Scratch Part 2: Building the Tool System (Research Report Generator)

Give your AI agent superpowers! Build a clean tool system with web search, content extraction, and file operations—the foundation that lets agents interact with the real world.

Read more
aiagentspython

AI Agents from Scratch Part 1: Understanding the ReAct Pattern (Research Report Generator)

Start your journey building AI agents without frameworks. Learn the foundational ReAct pattern that powers modern agents—with a hands-on Research Report Generator example.

Read more
AI NewsSpring AILLM

Overview of MCP Annotations in Spring AI

Explore the Spring AI MCP annotations to significantly lower the barrier to entry for building agentic AI systems.

Read more
AI NewsLLMSoftware Engineering

Taming LLM Output Chaos: A 3-Tier Normalisation Pattern

A 3-tier normalisation pattern achieves 100% collision detection in LLM-powered knowledge graph construction by addressing inconsistent outputs.

Read more
AI NewsMobileLLM

Cactus v1: Cross-Platform LLM Inference on Mobile with Zero Latency and Full Privacy

Cactus v1 delivers sub-50ms time-to-first-token for on-device LLM inference, enabling mobile AI without network dependence or privacy concerns.

Read more
AI NewsLLMCLI

Toad: A Unified CLI for LLM Agents with Enhanced UX

Toad, a new CLI tool by Will McGugan, unifies access to 12+ LLM agents via the Agent Communication Protocol (ACP), aiming to improve the user experience of AI-assisted coding.

Read more
AI NewsJavaLLM

Jlama: Running LLMs Locally in Java

Jlama 0.8.4 enables local LLM inference in Java, eliminating reliance on external APIs and offering greater control.

Read more
AI NewsPythonLLM

Building Your First MCP Server in Python

This guide details building a complete MCP server in Python, demonstrating tools, resources, and prompts for LLM integration.

Read more
PythonAILangChain

LangChain Complete Guide: Building Production-Ready LLM Applications

Master LangChain for building production LLM applications. Learn chains, agents, memory systems, RAG, vector stores, evaluation, and deployment strategies with practical Python examples.

Read more
AI NewsObservabilityLLM

Why Observability Matters for AI Applications: A Deep Dive into LLM Monitoring

Sally O'Malley explains the unique observability challenges of Large Language Models (LLMs) and demonstrates how to implement an open-source observability stack using vLLM, Llama Stack, Prometheus, Grafana, and OpenTelemetry. She discusses key metrics for monitoring performance, cost, and quality, and the importance of tracing for debugging AI workloads.

Read more
AILLMGithub

Stock Weather AI

A compact AI toolkit that collects market data and news, runs lightweight evaluations, and produces per-ticker weather-style reports for stock analysis experiments.

Read more
AI NewsLLMFine-tuning

20x Faster TRL Fine-tuning with RapidFire AI

Hugging Face TRL integrates with RapidFire AI, delivering 16–24x faster experimentation throughput for LLM fine-tuning.

Read more