Skip to main content

On This Page

Solved: Automating AWS EC2 Snapshots with Lambda & CloudWatch Events

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Automating AWS EC2 Snapshots with Lambda & CloudWatch Events

Manual AWS EC2 snapshot management is prone to errors and costly, potentially leading to significant data loss and operational disruption. This tutorial provides a robust, cost-effective solution to automate EC2 snapshot creation using AWS Lambda and CloudWatch Events, ensuring critical data backup without manual overhead.

Why This Matters

While ideal models assume perfect operational execution, reality introduces human error and inconsistent application of backup policies. The cost of data loss from unmanaged snapshots can range from hours of recovery time to millions in revenue depending on the impacted system. Automating this process mitigates these risks and ensures consistent, reliable backups.

Key Insights

  • A dedicated IAM role for the Lambda function requires ec2:DescribeInstances, ec2:DescribeVolumes, ec2:CreateSnapshot, ec2:CreateTags, and logs:PutLogEvents permissions.
  • The Python Lambda function, using boto3, iterates through running EC2 instances, identifies EBS volumes, and creates snapshots with descriptive tags.
  • Amazon CloudWatch Events (EventBridge) are configured with a cron schedule to trigger the Lambda function periodically, automating the snapshot process.

Working Example

import boto3
import datetime
import os

def lambda_handler(event, context):
    ec2 = boto3.client('ec2', region_name=os.environ.get('AWS_REGION', 'us-east-1'))
    try:
        instances_response = ec2.describe_instances(
            Filters=[
                {'Name': 'instance-state-name', 'Values': ['running']}
            ]
        )
        for reservation in instances_response['Reservations']:
            for instance in reservation['Instances']:
                instance_id = instance['InstanceId']
                instance_name = 'No-Name'
                for tag in instance.get('Tags', []):
                    if tag['Key'] == 'Name':
                        instance_name = tag['Value']
                        break
                print(f"Processing instance: {instance_id} ({instance_name})")
                for block_device_mapping in instance.get('BlockDeviceMappings', []):
                    if 'Ebs' in block_device_mapping:
                        volume_id = block_device_mapping['Ebs']['VolumeId']
                        description = (
                            f"Automated snapshot of {volume_id} "
                            f"attached to {instance_id} ({instance_name}) "
                            f"created on {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}."
                        )
                        print(f"Creating snapshot for volume: {volume_id}")
                        snapshot = ec2.create_snapshot(
                            VolumeId=volume_id,
                            Description=description,
                            TagSpecifications=[
                                {
                                    'ResourceType': 'snapshot',
                                    'Tags': [
                                        {'Key': 'CreatedBy', 'Value': 'Lambda'},
                                        {'Key': 'Automation', 'Value': 'EC2SnapshotTool'},
                                        {'Key': 'InstanceId', 'Value': instance_id},
                                        {'Key': 'InstanceName', 'Value': instance_name},
                                        {'Key': 'VolumeId', 'Value': volume_id},
                                        {'Key': 'Name', 'Value': f"{instance_name}-{volume_id}-snapshot-{datetime.datetime.now().strftime('%Y%m%d%H%M')}"}
                                    ]
                                }
                            ]
                        )
                        print(f"Snapshot created: {snapshot['SnapshotId']}")
    except Exception as e:
        print(f"Error creating snapshots: {e}")
        raise e
    return {
        'statusCode': 200,
        'body': 'EC2 snapshots created successfully!'
    }

Practical Applications

  • TechResolve: Automates daily snapshots of production EC2 instances to ensure rapid recovery in case of failure.
  • Pitfall: Relying on default Lambda timeouts can lead to incomplete snapshot creation if the function encounters a large number of volumes, resulting in inconsistent backups.

References:

Continue reading

Next article

Solved: How to Send Custom Prometheus Alerts to Discord via Webhooks

Related Content