Observability Framework: Choosing Between Errors, Traces, Logs, and Metrics
These articles are AI-generated summaries. Please check the original sources for full details.
Errors, traces, logs, metrics: when to reach for what
Sergiy Dybskiy defines a framework for instrumenting code using four telemetry signals. Sentry expanded its toolset by adding structured logs last year and Application Metrics in May 2024.
Why This Matters
While developers can technically overlap telemetry signals—such as counting log events as metrics—doing so disrupts specialized workflows like deduplication and waterfall visualization. Relying on the wrong signal often leads to visibility gaps; for example, sampled traces may miss the specific failing request that only a non-sampled structured log can reveal.
Key Insights
- Sentry’s error tracking dates back to 2012 and tracing to 2020, providing a foundation for deduplicated issue tracking.
- Traces function as timed waterfalls of spans to identify bottlenecks, such as an LLM tool call taking 8 seconds instead of 200ms.
- Metrics provide historical trend data via counters and gauges, allowing engineers to slice data (e.g., checkouts by region) and trigger alerts.
- Logs capture state at specific decision points—such as feature flag values—to explain ‘why’ a request behaved a certain way when traces are sampled out.
Working Examples
FastAPI handler demonstrating combined use of span attributes, structured logs, and metrics.
import sentry_sdk
from sentry_sdk import logger
# The route is auto-instrumented. FastAPI gives the request span;
# the DB integration gives a span for every query below.
@app.get("/recommendations/{user_id}")
def get_recommendations(user_id: int):
user = db.get_user(user_id) # auto-instrumented db span
use_v2 = flag_enabled("ranking_v2", user)
ranking_version = "v2" if use_v2 else "v1"
candidates = db.personalized_recs(user_id, version=ranking_version) # auto db span
outcome = "personalized" if candidates else "fallback"
items = candidates or db.popular_items() # auto db span on the fallback
# SPAN ATTRIBUTE: context about THIS request's flow, read inside the trace.
span = sentry_sdk.get_current_span()
span.set_data("ranking_version", ranking_version)
span.set_data("recommendation.outcome", outcome)
# LOG: the trail through the decision tree, recording *why*.
logger.info(
"recommendations lookup",
attributes={
"user_id": user_id,
"ranking_version": ranking_version,
"flag.ranking_v2": use_v2,
"source_table": f"recommendations_{ranking_version}",
"candidate_count": len(candidates),
"outcome": outcome,
},
)
# METRIC: the rate across all requests, sliceable by version and outcome.
sentry_sdk.metrics.count(
"recommendations.served",
1,
attributes={"ranking_version": ranking_{version}, "outcome": outcome},
)">
unique return items
Practical Applications
- Use case: A storefront API uses metrics to detect that a v2 feature flag rollout is causing an increase in fallback outcomes across a cohort of users.
- Pitfall: Using only wide events instead of specific signals; this prevents data from being rendered as a waterfall or grouped into deduplicated Issues.
References:
Continue reading
Next article
Inside Blackbox AI: How Proxy Routing Masks LLM Identity
Related Content
OpenTelemetry Standardizes Cloud Observability Across Distributed Systems
OpenTelemetry establishes a unified standard for metrics, logs, and traces, eliminating vendor lock-in for complex distributed cloud environments.
The Shift to Distributed Tracing: How OpenTelemetry Standardized Observability
Distributed tracing replaces logs as the primary source of truth, reducing debugging time from 4 hours to 15 minutes via OpenTelemetry.
OtlpDashboard: Consolidating the Observability Stack into a Single Container
Andrea Ficarra introduces OtlpDashboard, a single-container alternative to the Grafana, Loki, Tempo, and Prometheus stack for OTLP telemetry.