Additional Best Practices for AWS Control Tower
This document outlines additional best practices that can be implemented to further enhance the Control Tower deployment’s security, reliability, maintainability, and operational excellence.
Table of Contents
- Pre-Commit Hooks
- Terraform State Management
- Secrets Management
- Monitoring and Observability
- Disaster Recovery
- Cost Optimization
- Security Hardening
- Compliance and Governance
- Documentation
- CI/CD Enhancements
1. Pre-Commit Hooks
Implementation
Add pre-commit hooks to automatically validate code before commits.
Benefits:
- Catch errors early in development
- Enforce code quality standards
- Prevent committing sensitive data
- Ensure consistent formatting
Recommended Tools:
pre-commitframeworkterraform fmt- Format checkingterraform validate- Syntax validationtfsec- Security scanningtflint- Lintingdetect-secrets- Prevent credential leakscheckov- Policy as code
Files to Create:
.pre-commit-config.yaml.pre-commit-hooks/directory
Priority: HIGH
2. Terraform State Management
Current State
- S3 backend with native state locking (Terraform 1.6+)
- State stored in management account
Additional Best Practices
2.1 State File Encryption
Status: ✅ Implemented (KMS encryption)
2.2 State File Versioning
Status: ✅ Implemented (S3 versioning)
2.3 State File Backup Automation
Recommendation: Implement automated state backups
# Add to cron or GitHub Actions
#!/bin/bash
# scripts/backup-state-automated.sh
DATE=$(date +%Y%m%d_%H%M%S)
terraform state pull > "s3://backup-bucket/state-backups/terraform.tfstate.$DATE"
2.4 State File Access Logging
Recommendation: Enable CloudTrail logging for S3 state bucket
# Add to backend module
resource "aws_s3_bucket_logging" "state_bucket" {
bucket = aws_s3_bucket.terraform_state.id
target_bucket = aws_s3_bucket.logs.id
target_prefix = "state-access-logs/"
}
2.5 State Locking Monitoring
Recommendation: Monitor state lock duration and failures
Priority: MEDIUM
3. Secrets Management
Current State
- AWS credentials via environment variables or AWS CLI
- No secrets in code (good practice)
Additional Best Practices
3.1 AWS Secrets Manager Integration
Recommendation: Store sensitive variables in AWS Secrets Manager
# Example: Retrieve notification emails from Secrets Manager
data "aws_secretsmanager_secret_version" "notification_emails" {
secret_id = "control-tower/notification-emails"
}
locals {
notification_emails = jsondecode(data.aws_secretsmanager_secret_version.notification_emails.secret_string)
}
3.2 HashiCorp Vault Integration
Recommendation: For multi-cloud or hybrid environments
3.3 SOPS (Secrets OPerationS)
Recommendation: Encrypt sensitive tfvars files
# Encrypt production variables
sops -e terraform.tfvars.production > terraform.tfvars.production.enc
# Decrypt during deployment
sops -d terraform.tfvars.production.enc > terraform.tfvars.production
3.4 Git-Secrets
Recommendation: Prevent committing secrets
# Install git-secrets
brew install git-secrets
# Setup
git secrets --install
git secrets --register-aws
Priority: HIGH
4. Monitoring and Observability
Current State
- CloudWatch Logs for Control Tower
- CloudWatch Alarms for security events
- SNS notifications
Additional Best Practices
4.1 Terraform Deployment Metrics
Recommendation: Track deployment metrics
# Add to main.tf
resource "aws_cloudwatch_metric_alarm" "terraform_deployment_duration" {
alarm_name = "terraform-deployment-duration"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "DeploymentDuration"
namespace = "ControlTower/Terraform"
period = "300"
statistic = "Average"
threshold = "3600" # 1 hour
alarm_description = "Terraform deployment taking too long"
alarm_actions = [aws_sns_topic.operational_notifications.arn]
}
4.2 Infrastructure Drift Detection
Recommendation: Automated drift detection
# .github/workflows/drift-detection.yml
name: Drift Detection
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours
jobs:
detect-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Terraform Plan
run: terraform plan -detailed-exitcode
- name: Notify on Drift
if: failure()
run: |
# Send notification
4.3 Cost Monitoring
Recommendation: Track infrastructure costs
# Add AWS Cost Anomaly Detection
resource "aws_ce_anomaly_monitor" "control_tower" {
name = "control-tower-cost-monitor"
monitor_type = "DIMENSIONAL"
monitor_dimension = "SERVICE"
}
resource "aws_ce_anomaly_subscription" "control_tower" {
name = "control-tower-cost-alerts"
frequency = "DAILY"
monitor_arn_list = [
aws_ce_anomaly_monitor.control_tower.arn
]
subscriber {
type = "SNS"
address = aws_sns_topic.operational_notifications.arn
}
threshold_expression {
dimension {
key = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
values = ["100"] # Alert on $100+ anomalies
match_options = ["GREATER_THAN_OR_EQUAL"]
}
}
}
4.4 Centralized Logging
Recommendation: Aggregate logs from all accounts
Priority: MEDIUM
5. Disaster Recovery
Current State
- S3 versioning for state files
- Manual state backups via Makefile
Additional Best Practices
5.1 Cross-Region State Replication
Recommendation: Replicate state bucket to DR region
# Add to backend module
resource "aws_s3_bucket_replication_configuration" "state_replication" {
bucket = aws_s3_bucket.terraform_state.id
role = aws_iam_role.replication.arn
rule {
id = "replicate-state"
status = "Enabled"
destination {
bucket = aws_s3_bucket.terraform_state_replica.arn
storage_class = "STANDARD_IA"
encryption_configuration {
replica_kms_key_id = aws_kms_key.replica.arn
}
}
}
}
5.2 Disaster Recovery Runbook
Recommendation: Create detailed DR procedures
File: docs/DISASTER_RECOVERY.md
Contents:
- State file recovery procedures
- Control Tower rebuild process
- Account recovery procedures
- RTO/RPO definitions
- Contact information
5.3 Backup Testing
Recommendation: Regularly test state restoration
# scripts/test-state-restore.sh
#!/bin/bash
# Test state restoration in isolated environment
5.4 Multi-Region Deployment
Recommendation: Deploy critical infrastructure in multiple regions
Priority: MEDIUM
6. Cost Optimization
Current State
- Instance type restrictions for non-prod
- Optional expensive features (Macie, centralized networking)
Additional Best Practices
6.1 Resource Tagging for Cost Allocation
Recommendation: Enforce comprehensive tagging
# Add to variables.tf
variable "required_tags" {
description = "Required tags for all resources"
type = list(string)
default = ["Environment", "Project", "CostCenter", "Owner"]
}
# Add validation in locals.tf
resource "null_resource" "validate_tags" {
lifecycle {
precondition {
condition = alltrue([
for tag in var.required_tags :
contains(keys(var.default_tags), tag)
])
error_message = "Missing required tags: ${join(", ", var.required_tags)}"
}
}
}
6.2 AWS Budgets
Recommendation: Set up budget alerts
resource "aws_budgets_budget" "control_tower" {
name = "control-tower-monthly-budget"
budget_type = "COST"
limit_amount = "5000"
limit_unit = "USD"
time_period_start = "2024-01-01_00:00"
time_unit = "MONTHLY"
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.operational_notification_emails
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = var.operational_notification_emails
}
}
6.3 Resource Scheduling
Recommendation: Auto-stop non-prod resources
# Add Lambda function to stop non-prod resources after hours
resource "aws_lambda_function" "resource_scheduler" {
filename = "lambda/resource-scheduler.zip"
function_name = "control-tower-resource-scheduler"
role = aws_iam_role.scheduler.arn
handler = "index.handler"
runtime = "python3.11"
environment {
variables = {
STOP_SCHEDULE = "0 19 * * MON-FRI" # 7 PM weekdays
START_SCHEDULE = "0 7 * * MON-FRI" # 7 AM weekdays
}
}
}
6.4 S3 Intelligent-Tiering
Recommendation: Use S3 Intelligent-Tiering for log buckets
Priority: MEDIUM
7. Security Hardening
Current State
- 35+ SCPs
- KMS encryption
- GuardDuty, Security Hub, Config
- Network Firewall
Additional Best Practices
7.1 AWS Systems Manager Session Manager
Recommendation: Replace SSH/RDP with Session Manager
# Add to security module
resource "aws_ssm_document" "session_manager_prefs" {
name = "SSM-SessionManagerRunShell"
document_type = "Session"
document_format = "JSON"
content = jsonencode({
schemaVersion = "1.0"
description = "Document to hold regional settings for Session Manager"
sessionType = "Standard_Stream"
inputs = {
s3BucketName = aws_s3_bucket.session_logs.id
s3EncryptionEnabled = true
cloudWatchLogGroupName = aws_cloudwatch_log_group.session_logs.name
cloudWatchEncryptionEnabled = true
kmsKeyId = aws_kms_key.main.id
runAsEnabled = false
runAsDefaultUser = ""
}
})
}
7.2 AWS WAF for API Gateway
Recommendation: Add WAF protection
resource "aws_wafv2_web_acl" "api_protection" {
name = "control-tower-api-protection"
scope = "REGIONAL"
default_action {
allow {}
}
rule {
name = "RateLimitRule"
priority = 1
action {
block {}
}
statement {
rate_based_statement {
limit = 2000
aggregate_key_type = "IP"
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "RateLimitRule"
sampled_requests_enabled = true
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "ControlTowerWAF"
sampled_requests_enabled = true
}
}
7.3 VPC Endpoints for AWS Services
Recommendation: Use VPC endpoints to avoid internet traffic
# Add to networking module
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.inspection.id
service_name = "com.amazonaws.${var.region}.s3"
route_table_ids = [
aws_route_table.private.id
]
tags = merge(var.tags, {
Name = "s3-endpoint"
})
}
7.4 AWS Shield Advanced
Recommendation: For DDoS protection (if needed)
7.5 AWS Inspector
Recommendation: Automated vulnerability scanning
resource "aws_inspector2_enabler" "control_tower" {
account_ids = [data.aws_caller_identity.current.account_id]
resource_types = ["EC2", "ECR", "LAMBDA"]
}
Priority: HIGH
8. Compliance and Governance
Current State
- AWS Config rules
- Security Hub standards
- SCP policies
Additional Best Practices
8.1 AWS Audit Manager
Recommendation: Automated compliance reporting
resource "aws_auditmanager_assessment" "control_tower" {
name = "control-tower-compliance"
assessment_reports_destination {
destination = "s3://${aws_s3_bucket.audit_reports.id}"
destination_type = "S3"
}
framework_id = "arn:aws:auditmanager:${var.region}::framework/AWS-Well-Architected-Framework"
roles {
role_arn = aws_iam_role.audit_manager.arn
role_type = "PROCESS_OWNER"
}
scope {
aws_accounts {
id = data.aws_caller_identity.current.account_id
}
aws_services {
service_name = "ec2"
}
}
}
8.2 Compliance Reporting
Recommendation: Automated compliance reports
# scripts/generate-compliance-report.sh
#!/bin/bash
# Generate compliance report from Security Hub, Config, etc.
8.3 Policy as Code Expansion
Recommendation: Add more OPA policies
New Policies:
- Data residency validation
- Tagging compliance
- Cost threshold validation
- Resource naming conventions
- Backup policy compliance
8.4 Change Management
Recommendation: Implement change approval workflow
# .github/workflows/change-approval.yml
name: Change Approval
on:
pull_request:
types: [opened, synchronize]
jobs:
require-approval:
runs-on: ubuntu-latest
steps:
- name: Check Approvals
uses: actions/github-script@v7
with:
script: |
const reviews = await github.rest.pulls.listReviews({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: context.issue.number
});
const approvals = reviews.data.filter(r => r.state === 'APPROVED');
if (approvals.length < 2) {
core.setFailed('Requires 2 approvals for infrastructure changes');
}
Priority: MEDIUM
9. Documentation
Current State
- Comprehensive inline comments
- Multiple documentation files in docs/
- README files in modules
Additional Best Practices
9.1 Architecture Diagrams
Recommendation: Create visual architecture diagrams
Tools:
- draw.io
- Lucidchart
- CloudCraft
- Terraform Graph
Diagrams Needed:
- Overall architecture
- Network topology
- Security controls
- Data flow
- Deployment pipeline
9.2 API Documentation
Recommendation: Document module interfaces
# Generate module documentation
terraform-docs markdown table --output-file README.md modules/security/
9.3 Runbooks
Recommendation: Create operational runbooks
Runbooks Needed:
- Incident response
- Disaster recovery
- Deployment procedures
- Troubleshooting guide
- Rollback procedures
9.4 Decision Records
Recommendation: Document architectural decisions
File: docs/adr/ (Architecture Decision Records)
# ADR-001: Use Terraform 1.6+ for Native S3 State Locking
## Status
Accepted
## Context
Need to manage Terraform state locking without DynamoDB.
## Decision
Use Terraform 1.6+ with native S3 state locking.
## Consequences
- Reduced cost (no DynamoDB)
- Simpler architecture
- Requires Terraform 1.6+
9.5 Video Tutorials
Recommendation: Create video walkthroughs
Priority: LOW
10. CI/CD Enhancements
Current State
- GitHub Actions workflow
- Validation, security scanning, OPA tests
- Manual approval for production
Additional Best Practices
10.1 Multi-Environment Pipelines
Recommendation: Separate pipelines per environment
# .github/workflows/deploy-dev.yml
# .github/workflows/deploy-staging.yml
# .github/workflows/deploy-prod.yml
10.2 Automated Rollback
Recommendation: Implement automatic rollback on failure
- name: Apply Terraform
id: apply
run: terraform apply -auto-approve
- name: Rollback on Failure
if: failure()
run: |
terraform state push backups/terraform.tfstate.backup
terraform apply -auto-approve
10.3 Canary Deployments
Recommendation: Gradual rollout for changes
10.4 Integration Tests
Recommendation: Add integration tests post-deployment
- name: Integration Tests
run: |
# Test Control Tower is accessible
aws organizations describe-organization
# Test OUs exist
aws organizations list-organizational-units-for-parent --parent-id $ROOT_ID
# Test SCPs are attached
aws organizations list-policies-for-target --target-id $OU_ID --filter SERVICE_CONTROL_POLICY
10.5 Deployment Notifications
Recommendation: Slack/Teams notifications
- name: Notify Slack
uses: 8398a7/action-slack@v3
with:
status: $
text: 'Control Tower deployment completed'
webhook_url: $
10.6 Terraform Cloud/Enterprise
Recommendation: Consider Terraform Cloud for:
- Remote execution
- Policy as code (Sentinel)
- Private module registry
- Cost estimation
- Team collaboration
Priority: MEDIUM
Implementation Priority Matrix
High Priority (Implement First)
- ✅ Pre-commit hooks
- ✅ Secrets management (AWS Secrets Manager)
- ✅ Security hardening (Session Manager, Inspector)
- ✅ Git-secrets
Medium Priority (Implement Next)
- ✅ Drift detection automation
- ✅ Cost monitoring and budgets
- ✅ Disaster recovery runbook
- ✅ Compliance reporting
- ✅ Architecture diagrams
Low Priority (Nice to Have)
- ⏳ Terraform Cloud migration
- ⏳ Video tutorials
- ⏳ Canary deployments
- ⏳ AWS Shield Advanced
Quick Wins (Easy to Implement)
- Add .pre-commit-config.yaml (30 minutes)
- Create DR runbook (2 hours)
- Add AWS Budgets (1 hour)
- Enable S3 Intelligent-Tiering (30 minutes)
- Add git-secrets (30 minutes)
- Create architecture diagram (2 hours)
- Add Slack notifications to CI/CD (1 hour)
Estimated Implementation Time
| Category | Time Required |
|---|---|
| Pre-commit Hooks | 2 hours |
| Secrets Management | 4 hours |
| Monitoring Enhancements | 8 hours |
| Disaster Recovery | 6 hours |
| Cost Optimization | 4 hours |
| Security Hardening | 8 hours |
| Compliance | 6 hours |
| Documentation | 12 hours |
| CI/CD Enhancements | 8 hours |
| Total | 58 hours |
Next Steps
- Review this document with the team
- Prioritize based on business needs
- Create implementation tickets
- Assign owners
- Set deadlines
- Track progress