Install
openclaw skills install afrexai-terraform-productionComplete Terraform & IaC production methodology — project structure, module design, state management, multi-environment deployment, security hardening, testing, CI/CD pipelines, cost optimization, and drift management. Use when designing infrastructure, writing Terraform, reviewing IaC, or managing cloud environments.
openclaw skills install afrexai-terraform-productionComplete 14-phase system for production-grade infrastructure as code. Zero dependencies — works with any cloud provider and any Terraform version.
Run this 8-signal triage on any Terraform project:
| # | Signal | ✅ Healthy | 🔴 Fix Now |
|---|---|---|---|
| 1 | Remote state backend | S3/GCS/Azure Blob with locking | Local state or no locking |
| 2 | State encryption | Encrypted at rest + restricted access | Plain state, wide access |
| 3 | Module pinning | All modules version-pinned | Unpinned or ref=main |
| 4 | Provider pinning | required_providers with ~> constraints | No version constraints |
| 5 | Separate environments | Isolated state per env (dev/staging/prod) | Shared state or workspaces-as-envs |
| 6 | Plan before apply | CI runs plan, human approves, CI runs apply | Local apply without review |
| 7 | Secrets management | No secrets in .tf files; vault/SSM/secrets manager | Hardcoded secrets anywhere |
| 8 | Drift detection | Scheduled drift checks (weekly minimum) | No drift monitoring |
Score: /16 (2 per signal). Below 10 = stop and fix foundations first.
infrastructure/
├── modules/ # Reusable modules (internal registry)
│ ├── networking/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── versions.tf
│ │ └── README.md
│ ├── compute/
│ ├── database/
│ └── monitoring/
├── environments/ # Environment-specific configs
│ ├── dev/
│ │ ├── main.tf # Module calls with dev params
│ │ ├── backend.tf # Dev state backend
│ │ ├── terraform.tfvars # Dev variable values
│ │ └── versions.tf
│ ├── staging/
│ └── prod/
├── global/ # Shared resources (IAM, DNS, etc.)
│ ├── iam/
│ ├── dns/
│ └── networking/
├── scripts/ # Helper scripts (import, migration)
├── policies/ # OPA/Sentinel policies
└── .github/workflows/ # CI/CD pipelines
.terraform.lock.hcl committed — reproducible provider versions| File | Purpose |
|---|---|
main.tf | Primary resource definitions |
variables.tf | All input variables |
outputs.tf | All outputs |
versions.tf | terraform and required_providers blocks |
backend.tf | State backend configuration |
locals.tf | Local values and computed expressions |
data.tf | Data sources |
providers.tf | Provider configuration (if complex) |
# backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "environments/prod/networking/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
kms_key_id = "alias/terraform-state"
}
}
{org}/{environment}/{component}/terraform.tfstate
Examples:
acme/prod/networking/terraform.tfstateacme/prod/compute/terraform.tfstateacme/global/iam/terraform.tfstate| Operation | Risk | Safe Approach |
|---|---|---|
terraform state mv | Medium | Plan after to verify no changes |
terraform state rm | High | Only to adopt resource elsewhere |
terraform import | Medium | Write config first, import, plan to verify |
terraform state pull | Low | For inspection only |
terraform state push | CRITICAL | Almost never — breaks consistency |
moved block | Low | Preferred over state mv — in config, reviewable |
# variables.tf — Module inputs
variable "name" {
description = "Name prefix for all resources"
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{2,28}[a-z0-9]$", var.name))
error_message = "Name must be 4-30 chars, lowercase alphanumeric + hyphens."
}
}
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "tags" {
description = "Common tags applied to all resources"
type = map(string)
default = {}
}
# outputs.tf — Module contract
output "vpc_id" {
description = "ID of the created VPC"
value = aws_vpc.main.id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
# environments/prod/main.tf
module "networking" {
source = "../../modules/networking"
name = "prod"
environment = "prod"
vpc_cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
tags = local.common_tags
}
module "compute" {
source = "../../modules/compute"
name = "prod"
environment = "prod"
vpc_id = module.networking.vpc_id
private_subnet_ids = module.networking.private_subnet_ids
instance_type = "t3.large"
min_size = 3
max_size = 10
tags = local.common_tags
}
for_each over count — stable resource addressingvalidation blocks catch errors at plan time~> for providersmoved blocks for refactoring — not state mvexamples/ directory with working configurations| Aspect | Dev | Staging | Prod |
|---|---|---|---|
| Instance sizes | Small/micro | Match prod types | Right-sized |
| Replica count | 1 | 2 | 3+ (HA) |
| Multi-AZ | Optional | Yes | Yes |
| Backup retention | 1 day | 7 days | 30+ days |
| Monitoring | Basic | Full | Full + PagerDuty |
| Auto-scaling | Off | On | On |
| WAF/Shield | Off | On | On + Advanced |
| State access | Dev team | DevOps | DevOps only |
# modules/compute/variables.tf
variable "instance_type" {
type = string
default = "t3.micro" # Safe default
}
variable "min_size" {
type = number
default = 1
}
variable "enable_deletion_protection" {
type = bool
default = true # Safe default — must explicitly disable for dev
}
# environments/dev/terraform.tfvars
instance_type = "t3.micro"
min_size = 1
enable_deletion_protection = false
# environments/prod/terraform.tfvars
instance_type = "t3.large"
min_size = 3
enable_deletion_protection = true
dev → staging → prod
│ │ │
│ │ └─ Manual approval required
│ └─ Auto-apply after plan review
└─ Auto-apply on merge to dev branch
P0 — Mandatory:
.tf files, .tfvars, or state (use vault/SSM/secrets manager)prevent_destroy on critical resources (databases, S3 with data).gitignore includes *.tfvars with secrets, .terraform/, *.tfstate*P1 — Required:
0.0.0.0/0 ingress except ALB on 443)P2 — Recommended:
tfsec or checkov in CI pipelineNeed a secret in Terraform?
├── Runtime secret (app needs at runtime)
│ └── Use AWS Secrets Manager / HashiCorp Vault
│ └── Reference via data source, pass ARN to app
├── Terraform-time secret (provider needs it)
│ └── Environment variable (TF_VAR_xxx) or OIDC
└── Generated secret (Terraform creates it)
└── random_password resource → store in Secrets Manager
└── Mark output as sensitive = true
# No access keys needed
data "aws_iam_openid_connect_provider" "github" {
url = "https://token.actions.githubusercontent.com"
}
resource "aws_iam_role" "terraform_ci" {
name = "terraform-ci"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Federated = data.aws_iam_openid_connect_provider.github.arn }
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
}
StringLike = {
"token.actions.githubusercontent.com:sub" = "repo:org/infra:*"
}
}
}]
})
}
| Level | Tool | What It Tests | When |
|---|---|---|---|
| Static | terraform validate, tflint, tfsec, checkov | Syntax, best practices, security | Every commit |
| Plan | terraform plan + policy checks | Expected changes, no surprises | Every PR |
| Contract | terratest / tftest (TF 1.6+) | Module inputs/outputs, behavior | PR + nightly |
| Integration | terratest with real cloud | Actual infrastructure works | Nightly/weekly |
# tests/networking.tftest.hcl
run "creates_vpc_with_correct_cidr" {
command = plan
variables {
name = "test"
environment = "dev"
vpc_cidr = "10.0.0.0/16"
azs = ["us-east-1a"]
}
assert {
condition = aws_vpc.main.cidr_block == "10.0.0.0/16"
error_message = "VPC CIDR doesn't match input"
}
assert {
condition = aws_vpc.main.enable_dns_hostnames == true
error_message = "DNS hostnames should be enabled"
}
}
run "rejects_invalid_environment" {
command = plan
expect_failures = [var.environment]
variables {
name = "test"
environment = "invalid"
vpc_cidr = "10.0.0.0/16"
azs = ["us-east-1a"]
}
}
- name: Terraform Lint & Security
run: |
terraform fmt -check -recursive
terraform validate
tflint --recursive
tfsec .
checkov -d . --framework terraform
terraform test is built-in, use itdefer cleanup to avoid orphaned resourcesinfracost to catch expensive surprisesname: Terraform
on:
pull_request:
paths: ['infrastructure/**']
push:
branches: [main]
paths: ['infrastructure/**']
permissions:
id-token: write # OIDC
contents: read
pull-requests: write # PR comments
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.7.x"
- run: terraform fmt -check -recursive
- run: terraform init -backend=false
- run: terraform validate
- run: tflint --recursive
- run: tfsec . --soft-fail
plan:
needs: validate
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging, prod]
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::role/terraform-ci
aws-region: us-east-1
- uses: hashicorp/setup-terraform@v3
- working-directory: infrastructure/environments/${{ matrix.environment }}
run: |
terraform init
terraform plan -out=tfplan -no-color
- uses: actions/upload-artifact@v4
with:
name: tfplan-${{ matrix.environment }}
path: infrastructure/environments/${{ matrix.environment }}/tfplan
apply:
if: github.ref == 'refs/heads/main'
needs: plan
runs-on: ubuntu-latest
environment: production # Requires approval
strategy:
matrix:
environment: [dev, staging, prod]
max-parallel: 1 # Sequential: dev → staging → prod
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::role/terraform-ci
aws-region: us-east-1
- uses: hashicorp/setup-terraform@v3
- uses: actions/download-artifact@v4
with:
name: tfplan-${{ matrix.environment }}
path: infrastructure/environments/${{ matrix.environment }}
- working-directory: infrastructure/environments/${{ matrix.environment }}
run: terraform apply tfplan
apply from local machines — CI/CD onlyplan to detect manual changeslocals {
common_tags = {
Project = var.project_name
Environment = var.environment
ManagedBy = "terraform"
Team = var.team
CostCenter = var.cost_center
Repository = "github.com/org/infrastructure"
}
}
# Apply to all resources
resource "aws_instance" "app" {
# ...
tags = merge(local.common_tags, {
Name = "${var.name}-app"
Role = "application"
})
}
{project}-{environment}-{component}-{qualifier}
Examples: acme-prod-vpc-main, acme-staging-rds-primary, acme-prod-alb-api
Conditional Resource Creation:
resource "aws_cloudwatch_metric_alarm" "cpu" {
count = var.environment == "prod" ? 1 : 0
# Only create alarms in prod
}
Dynamic Blocks:
resource "aws_security_group" "app" {
name = "${var.name}-app"
vpc_id = var.vpc_id
dynamic "ingress" {
for_each = var.ingress_rules
content {
from_port = ingress.value.port
to_port = ingress.value.port
protocol = "tcp"
cidr_blocks = ingress.value.cidrs
description = ingress.value.description
}
}
}
Data Source for Cross-Stack References:
# Instead of hardcoding VPC ID
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "environments/prod/networking/terraform.tfstate"
region = "us-east-1"
}
}
# Use: data.terraform_remote_state.networking.outputs.vpc_id
# .github/workflows/drift-detection.yml
name: Drift Detection
on:
schedule:
- cron: '0 8 * * 1' # Weekly Monday 8 AM UTC
jobs:
detect:
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging, prod]
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::role/terraform-ci
aws-region: us-east-1
- uses: hashicorp/setup-terraform@v3
- working-directory: infrastructure/environments/${{ matrix.environment }}
run: |
terraform init
terraform plan -detailed-exitcode -no-color 2>&1 | tee plan.txt
EXIT_CODE=$?
if [ $EXIT_CODE -eq 2 ]; then
echo "::warning::Drift detected in ${{ matrix.environment }}"
# Send Slack alert
fi
| Drift Type | Response |
|---|---|
| Manual console change (cosmetic) | Import or update config to match |
| Manual console change (critical) | Investigate who/why, then align |
| Auto-scaling / ASG changes | Expected — use ignore_changes for dynamic attributes |
| AWS service updates | Update provider version, review changelog |
| Security group modified manually | 🚨 Security incident — investigate immediately |
ignore_changes Decision GuideUse ignore_changes ONLY for:
Never ignore_changes for:
- name: Infracost
run: |
infracost breakdown --path infrastructure/environments/prod/ \
--format json --out-file infracost.json
infracost output --path infracost.json --format github-comment \
--out-file comment.md
# Post as PR comment
| Strategy | Savings | Implementation |
|---|---|---|
| Reserved Instances / Savings Plans | 30-60% | Annual commitment for stable workloads |
| Right-sizing | 20-40% | Monitor CPU/memory, downsize over-provisioned |
| Spot/Preemptible for non-critical | 60-90% | Batch jobs, dev environments |
| S3 lifecycle policies | 20-50% storage | Transition to IA → Glacier → delete |
| NAT Gateway alternatives | $30-100/mo per GW | NAT instances for dev, VPC endpoints |
| Dev environment scheduling | 60-70% | Destroy nights/weekends, recreate on demand |
| Unused resource cleanup | Variable | Tag with TTL, auto-delete untagged after 7 days |
Required cost tags (enforce via policy):
CostCenter — maps to business unitEnvironment — dev/staging/prodProject — which project owns thisTeam — responsible teamManagedBy — terraform/manual/otherWhen you have 5+ environments with identical module structures, Terragrunt eliminates repetition:
# terragrunt.hcl (root)
remote_state {
backend = "s3"
generate = { path = "backend.tf", if_exists = "overwrite_terragrunt" }
config = {
bucket = "company-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
# Declarative import — reviewable in PR
import {
to = aws_s3_bucket.existing
id = "my-existing-bucket"
}
resource "aws_s3_bucket" "existing" {
bucket = "my-existing-bucket"
# Write config to match existing resource
}
provider "aws" {
region = "us-east-1"
}
provider "aws" {
alias = "eu"
region = "eu-west-1"
}
module "eu_networking" {
source = "../../modules/networking"
providers = { aws = aws.eu }
# ...
}
moved Block for Refactoring# Rename without destroy+create
moved {
from = aws_instance.app
to = aws_instance.application
}
# Move into module
moved {
from = aws_instance.app
to = module.compute.aws_instance.app
}
# Enable versioning on state bucket (BEFORE you need it)
aws s3api put-bucket-versioning \
--bucket company-terraform-state \
--versioning-configuration Status=Enabled
# List state versions
aws s3api list-object-versions \
--bucket company-terraform-state \
--prefix environments/prod/networking/terraform.tfstate
# Restore previous version
aws s3api get-object \
--bucket company-terraform-state \
--key environments/prod/networking/terraform.tfstate \
--version-id "versionId123" \
restored-state.tfstate
terraform state pull > backup.tfstate — backup current statebackend.tf with new backend configterraform init -migrate-state — Terraform copies stateterraform plan — verify no changes (state matches)When upgrading major Terraform or provider versions:
.terraform.lock.hclterraform plan in all environments| Dimension | Weight | Score Range |
|---|---|---|
| State management | 20% | 0-20 |
| Security posture | 20% | 0-20 |
| Module design | 15% | 0-15 |
| Testing coverage | 15% | 0-15 |
| CI/CD automation | 10% | 0-10 |
| Documentation | 10% | 0-10 |
| Cost governance | 5% | 0-5 |
| Drift management | 5% | 0-5 |
Scoring Guide:
for_each over count — stable addressing saves youterraform apply in prod| Mistake | Impact | Fix |
|---|---|---|
| Local state for team projects | State conflicts, data loss | Remote backend day 1 |
Secrets in .tfvars committed to git | Credential exposure | Use vault/SSM + env vars |
count for optional resources | Index shift on removal | for_each with conditional map |
| Monolithic state file | Slow plans, blast radius | Split by component (networking/compute/data) |
No prevent_destroy on data stores | Accidental database deletion | Lifecycle rule on stateful resources |
| Unpinned module versions | Breaking changes on init | Pin with ?ref=v1.2.3 or version = "~> 1.2" |
terraform apply -auto-approve in prod | Unreviewed changes | Plan artifact → human review → apply |
| Using workspaces as environments | Shared state, shared blast radius | Separate directories + backends per env |
| No cost estimation in CI | $10K surprise bills | Infracost or similar on every PR |
| Manual changes "just this once" | Permanent drift | Always go through code, even for emergencies |
This skill covers Terraform methodology and best practices. For industry-specific infrastructure patterns:
clawhub install afrexai-devops-engine — Complete DevOps & Platform Engineeringclawhub install afrexai-cybersecurity-engine — Security Hardening & Complianceclawhub install afrexai-system-architect — System Architecture Decision Frameworksclawhub install afrexai-api-architect — API Design & Lifecycle Managementclawhub install afrexai-cicd-engineering — CI/CD Pipeline EngineeringBrowse all AfrexAI skills: clawhub.com → Search "afrexai"
Storefront: afrexai-cto.github.io/context-packs