Skip to main content

On This Page

Building a Serverless Scanner to Detect and Manage Zombie AWS Resources

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

How I Built a Serverless Scanner to Find (and Kill) Zombie AWS Resources

Roberto Belotti engineered aws-zombie-hunter to eliminate silent budget drains caused by orphaned cloud infrastructure. The system utilizes a container-based Lambda that typically finds hundreds of dollars in waste while costing only $0.10 per month to operate.

Why This Matters

Infrastructure frequently outlives the context that created it, as projects get cancelled or POCs are never properly decommissioned. While tools like AWS Cost Explorer provide visibility into spending, they lack the logic to determine if a resource is truly necessary, leaving users to pay for idle assets like $219/month SFTP servers or orphaned NAT Gateways costing $32/month. Technical debt in the form of ‘zombie’ resources is the default state for most mature AWS accounts, necessitating automated discovery tools that understand resource relationships rather than just raw costs.

Key Insights

  • The scanner implements a Registry design pattern where every resource type inherits from a BaseScanner, allowing new resource checks to be added without modifying the core handler.
  • Parallel execution via ThreadPoolExecutor reduced scanning latency from three minutes to 45 seconds by handling I/O-bound AWS API calls concurrently.
  • AWS began charging $3.60/month for idle Elastic IPs in February 2024, significantly increasing the cost of unassociated networking assets.
  • A static prices.json file is used for cost estimation instead of the AWS Price List API to avoid high latency and complex response parsing during execution.
  • The project achieves 90% test coverage using the Moto library to mock AWS services, ensuring zero external dependencies during the CI/CD pipeline.

Working Examples

The common interface for all resource-specific scanners.

from abc import ABC, abstractmethod
class BaseScanner(ABC):
    VERSION: str = "1.0.0"
    def __init__(self, session: boto3.Session, regions: list[str]):
        self.session = session
        self.regions = regions
    @property
    @abstractmethod
    def resource_type(self) -> ResourceType:
        ...
    @abstractmethod
    def scan(self) -> list[ZombieResource]:
        ...

The Lambda handler orchestrating the discovery, scanning, and reporting process.

def lambda_handler(event, context):
    config = load_config()
    session = boto3.Session()
    scanners = ScannerRegistry.discover()
    results = ScannerRegistry.run_all(scanners, session, config.regions)
    report = ScanResult(zombies=results.zombies, errors=results.errors, regions_scanned=config.regions)
    save_to_s3(report, config.bucket, config.prefix)
    if config.sns_topic:
        notify(report.summary(), config.sns_topic)
    return report.summary()

Practical Applications

  • Use Case: Automating weekly infrastructure audits via EventBridge triggers to save structured JSON reports in S3 for long-term trend analysis.
  • Pitfall: Attempting automated termination of resources without human review; this tool uses read-only IAM policies to ensure safe observation without accidental data loss.
  • Use Case: Detecting ‘stopped’ RDS instances which AWS automatically restarts after 7 days, preventing recurring billing for unused database environments.
  • Pitfall: Using standard Zip-based Lambda deployments for complex scanners; container-based images bypass the 250MB limit caused by large libraries like Boto3 and Moto.

References:

Continue reading

Next article

Automated Documentation: Using Goose AI Agent to Ship 55 Pages in 4 Days

Related Content