I Built tfdrift Free Terraform Drift Detection With Severity Alerts
These articles are AI-generated summaries. Please check the original sources for full details.
I Built a Free Terraform Drift Detector — Here’s Why
Sudarshan Thakur developed tfdrift, an open-source CLI tool designed to solve the high cost and signal-to-noise ratio problems in infrastructure management. The tool successfully detected drifted EC2 instances in a test environment within 4 seconds, categorizing changes by risk level.
Why This Matters
Traditional ‘terraform plan’ commands lack severity awareness, treating critical IAM policy modifications and minor tag updates with the same priority. This results in alert fatigue for DevOps teams managing multiple workspaces where manual checks are inconsistent and security holes can persist silently for months. Engineering teams often face a choice between expensive enterprise solutions starting at $15,000 per year or basic, unmanaged cron jobs that fail to distinguish between noise and genuine security incidents.
Key Insights
- Severity Classification: tfdrift uses pattern matching to categorize changes, labeling security group ingress modifications as CRITICAL and tag updates as LOW.
- Multi-workspace Scanning: The tool recursively discovers all Terraform workspaces within a directory, eliminating the need to manually execute plans in 20+ environments.
- Ignore Rules: Users can utilize a .tfdriftignore file to filter out expected drift like ECS desired_count or Auto-scaling group capacity changes.
- Watch Mode: Continuous monitoring supports Slack webhook integration to alert teams at specific intervals (e.g., every 30 minutes) when drift occurs.
- CI/CD Integration: Specific exit codes (0 for clean, 1 for drift, 2 for error, 3 for remediated) allow pipelines to automatically fail on critical infrastructure changes.
Working Examples
Installation and basic scan of a local infrastructure directory.
pip install tfdrift
tfdrift scan --path ./infrastructure
Setting up continuous monitoring with Slack notifications.
tfdrift watch --interval 30m --slack-webhook https://hooks.slack.com/services/XXX
Customizing severity rules in the .tfdrift.yml configuration file.
severity:
critical:
- aws_security_group.*.ingress
- aws_iam_policy.*.policy
high:
- aws_instance.*.instance_type
- aws_rds_instance.*.publicly_accessible
Practical Applications
- Security Compliance: Automate 2 AM alerts via Slack when unauthorized security group or IAM policy changes occur in AWS production environments.
- Infrastructure Governance Pitfall: Relying on raw terraform plan often leads to ignoring critical drift because it is buried under ‘noise’ like Auto-scaling group capacity fluctuations.
- CI/CD Gatekeeping: Integrate tfdrift into GitHub Actions to fail deployments if high-severity drift is detected, preventing ‘apply’ operations from conflicting with manual console changes.
- Cost Management Pitfall: Manual checks across 20+ workspaces are frequently skipped by engineers, leading to undetected oversized instances that increase cloud spend.
References:
- https://dev.to/sudarshan_thakur_1e141b99/i-built-tfdrift-free-terraform-drift-detection-with-severity-alerts-2n96
- github.com/sudarshan8417/tfdrift
Continue reading
Next article
MnemoPay v1.4.0: Long-Term Memory and Financial Rails for AI Agents
Related Content
Implementing Policy-Gated Deployments and Observability with SwiftDeploy
Edith Asante introduces SwiftDeploy Stage 4B, a system that uses OPA to block deployments when disk space is below 10GB or error rates exceed 1%.
SwiftDeploy: Automating Infrastructure with OPA Guardrails and Chaos Engineering
SwiftDeploy automates infrastructure generation from a single manifest, using OPA policy gates to block deployments when CPU load exceeds thresholds.
Building Policy-Driven DevOps: Integrating OPA and Prometheus into SwiftDeploy
Frank develops SwiftDeploy, a gated CLI tool using OPA to block canary promotions when P99 latency exceeds 500ms or disk space drops below 10GB.