Skip to main content

On This Page

scrape-sentinel: A Standard-Library Change Detection Layer for Web Scraping

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

What changed since the last scrape? A small change-detection layer (stdlib only)

Developer Vinicius Pereira introduced scrape-sentinel, a new open-source Python library. The tool solves the common problem of detecting what changed between scraping runs, using key-based matching instead of positional diffing to avoid false positives from reordered pages.

Why This Matters

Most scrapers snapshot current state but fail to answer the actual question: what changed. This forces developers to rebuild fragile diffing logic on every project, often getting details wrong like treating reordered records as new data or corrupting state files when runs crash midway.

Key Insights

  • Key-based matching over positional diffing prevents false alerts from re-sorted pages or API responses (Pereira, 2026).
  • First-run baseline avoids alerting on all items as ‘new’; it silently records state instead.
  • “Ignore fields” mechanism drops noisy timestamps and tokens from comparisons automatically.
  • Atomic file writes (temp file + rename) prevent corrupted snapshots from incomplete runs breaking future diffs.

Working Examples

# Basic usage: diff two lists of dicts
from scrape_sentinel import diff
cs = diff(previous_records, current_records, key="sku", ignore_fields=["scraped_at"])
for r in cs.added:
    print("new:", r["sku"])
for changed in cs.changed:
    for d in changed.deltas:
        print(changed.key, d.field, d.old, "->", d.new)
# Full pipeline with I/O and alerts
from scrape_sentinel import (
CallableSource,
PipelineConfig,
SnapshotStore,
ConsoleAlerter,
WebhookAlerter,
run_once,
)
def scrape() -> list[dict]:
return fetch_products()
config = PipelineConfig(
key="sku",
ignore_fields=["scraped_at"],
alerters=[ConsoleAlerter(title="catalog", key_fields=("sku",)), WebhookAlerter(SLACK_URL)])
changes = run_once(CallableSource(scrape), SnapshotStore("./.state"), config)
print(changes.summary())

Practical Applications

  • • E-commerce monitoring: Detect price drops or stock changes for specific SKUs without false alarms from page reordering.
  • • Inventory tracking: Identify removed or added products across crawl runs using stable keys like UPC or model number.
  • • Pitfall: Relying on positional differencing will produce massive false-positive alerts when results are sorted differently.

References:

Continue reading

Next article

Why Intent Prediction Needs More Than an LLM: A Behavioral AI Perspective

Related Content