Amazon SNS Data Protection Policies Block, Mask, or Log Sensitive Data with 99% Sample Rate
These articles are AI-generated summaries. Please check the original sources for full details.
Data Protection in Amazon SNS
Amazon SNS Data Protection Policies identify PII/PHI in messages using machine learning and pattern matching. A 2025 audit policy logs findings with 99% sample rate to CloudWatch.
Why This Matters
Event-driven architectures prioritize speed, but sensitive data leaks (e.g., PII/PHI) expose systems to compliance risks. Manual masking or detection pipelines are error-prone and costly. SNS Data Protection automates this with predefined identifiers (e.g., email, DOB) and three operations: audit, de-identify, or deny. A 2022 study estimated data breach costs at $4.2M per incident, making automated controls critical.
Key Insights
- “99% sample rate in Audit policy, 2025” (from CloudWatch log configuration)
- “De-identify over masking for PII in healthcare apps” (example from context)
- “Lambda used by developers to test SNS policies” (code example in context)
Working Example
{
"Description": "Audit sensitive data without blocking delivery",
"Version": "2021-06-01",
"Statement": [
{
"DataDirection": "Inbound",
"DataIdentifier": [
"arn:aws:dataprotection::aws:data-identifier/EmailAddress",
"arn:aws:dataprotection::aws:data-identifier/DateOfBirth",
"arn:aws:dataprotection::aws:data-identifier/CreditCardNumber"
],
"Operation": {
"Audit": {
"FindingsDestination": {
"CloudWatchLogs": {
"LogGroup": "/aws/vendedlogs/sns-audit/"
}
},
"SampleRate": "99"
}
},
"Principal": ["*"],
"Sid": "AuditSensitiveData"
}
],
"Name": "sns-audit-policy"
}
import boto3
import os
import json
import logging
sns = boto3.client('sns')
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
message = {
"patientId": "PAT123456",
"name": "John Doe",
"dob": "12-01-2012",
"diagnosis": "Flu"
}
topics = ["AUDIT_TOPIC_ARN", "DEIDENTIFY_TOPIC_ARN", "DENY_TOPIC_ARN"]
results = {}
for topic_env in topics:
topic_arn = os.environ.get(topic_env)
try:
response = sns.publish(
TopicArn=topic_arn,
Message=json.dumps(message)
)
results[topic_env] = {
"status": "success",
"messageId": response.get("MessageId")
}
logger.info(f"Published to {topic_env}: {response.get('MessageId')}")
except sns.exceptions.InvalidParameterException as e:
logger.error(f"[{topic_env}] Sensitive data detected: {str(e)}")
results[topic_env] = {"status": "failed", "error": "Sensitive data not allowed"}
except Exception as e:
logger.error(f"[{topic_env}] Error: {str(e)}")
results[topic_env] = {"status": "failed", "error": str(e)}
return {"status": "completed", "results": results}
Practical Applications
- Use Case: Healthcare apps using De-identify to mask patient DOB before sending to analytics systems
- Pitfall: Using Audit alone without De-identify may leave sensitive data exposed in logs
References:
Continue reading
Next article
WhatsApp's Typing Status Architecture: Real-Time Efficiency at Scale
Related Content
AWS NACL — Subnet-Level Security in AWS 🔐
AWS Network Access Control Lists (NACLs) provide subnet-level security, controlling inbound and outbound traffic for enhanced VPC protection.
Building a Secure Bastion Host Architecture in AWS: A Complete Step-by-Step Guide
This guide details building a secure bastion host architecture in AWS, enhancing security by isolating critical resources and controlling access.
Deploying CyberChef on Ubuntu 24.04 with Docker and Traefik
Deploy GCHQ's CyberChef on Ubuntu 24.04 using Docker Compose and Traefik for automated HTTPS data transformation pipelines.