Skip to main content

On This Page

Optimizing AI Expenditures with llm-spend: A Python Profiler for LLM Costs

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

I Built a Profiler for My LLM Bill (and It Saved Me $30/month)

Developer Lakshmi Sravya Vedantham created llm-spend, a Python-based profiling tool designed to provide visibility into hidden AI API expenses. The tool revealed that a single summarization feature accounted for nearly 88% of a $47 OpenAI bill.

Why This Matters

Unlike traditional system resources like CPU or memory which utilize tools like htop or psutil, LLM costs remain invisible until the monthly invoice arrives. This lack of observability leads to a technical blind spot where inefficient prompts or high output token counts result in significant financial overhead without clear attribution to specific code functions.

Key Insights

  • Output tokens are the primary cost driver, often priced 3-5x higher than input tokens across models like GPT-4o ($2.50 vs $10.00 per 1M) and Claude Sonnet ($3.00 vs $15.00 per 1M).
  • Major LLM SDKs from OpenAI, Anthropic, and Google Gemini provide standardized usage fields in response objects, enabling cost tracking via simple attribute inspection.
  • The Python inspect.stack function allows the profiler to programmatically attribute costs to the exact source file and function name that triggered the request.
  • Local SQLite databases offer a zero-config, persistent storage solution for developer tools without the overhead of remote infrastructure.
  • Summarization tasks are disproportionately expensive compared to classification due to higher output token volume.
  • llm-spend provides terminal-based reporting to breakdown costs by file, model, or function label.

Working Examples

A decorator-based approach to automatically log token usage and costs to a local SQLite database.

from llm_spend import track\n@track(model="gpt-4o", label="summarize")\ndef summarize_article(text: str):\n    response = openai_client.chat.completions.create(\n        model="gpt-4o",\n        messages=[{"role": "user", "content": text}],\n    )\n    return response

A context manager for manual token tracking, useful for streaming responses or custom SDKs.

from llm_spend import spending\nwith spending("claude-sonnet-4", label="classify") as s:\n    response = client.messages.create(...)\n    s.input_tokens = response.usage.input_tokens\n    s.output_tokens = response.usage.output_tokens

Practical Applications

  • Use case: High-granularity cost attribution for Python-based AI agents. Pitfall: Relying on provider dashboards results in a lack of feature-level spend visibility.
  • Use case: Benchmarking model efficiency by comparing costs across Gemini, GPT, and Claude models. Pitfall: Ignoring output-to-input price ratios leads to unexpected budget exhaustion in text-generation tasks.

References:

Continue reading

Next article

Why AI Detection Tools Fail: Vibe-Check Scores 0/100 on AI-Generated Codebase

Related Content