Install
openclaw skills install terraform-plan-reviewerReview terraform plan output for risk before apply. Detects destructive changes (force replace and destroy on stateful resources), drift from real state, missing imports, secrets leaked into outputs or state, IAM permission widening, public-internet exposure on storage and security groups, and provider-specific footguns across AWS, GCP, and Azure. Produces a P0/P1/P2 severity matrix, applies gating rules (block, require-approval, allow), and writes a PR-comment-style review with line citations and suggested fixes. Use when asked to review a terraform plan, gate a CI apply, audit a Terragrunt run, check infrastructure changes for risk, or set up a plan-review GitHub Action. Triggers on "terraform plan", "tf plan", "terragrunt", "tofu plan", "infrastructure as code", "iac review", "plan output", "destroy", "force replace", "iam widening", "drift", "tfstate", "atlantis", "spacelift", "env0".
openclaw skills install terraform-plan-reviewerReview terraform plan (or tofu plan, or a Terragrunt run-all) and decide whether the apply is safe. Categorizes every change by destructiveness, drift, secret exposure, IAM blast radius, and provider-specific risk. Returns a PR-comment-style review with severity grades, gating decision, and line-cited suggested fixes. Acts as a senior platform engineer who has approved thousands of applies and reverted the bad ones.
Invoke this skill before any non-trivial terraform apply, in CI as a gate on PRs, or as a manual second opinion on an environment-affecting change.
Basic invocation:
Review this terraform plan: [paste plan output] Gate this PR — should we apply or block? Audit my Terragrunt run-all output for drift
With context:
Here's the plan and the previous state file — flag any IAM widening We're about to apply this to prod — what's the blast radius? Generate a GitHub Action that runs this review on every PR
The agent produces a graded review (P0/P1/P2 findings), a single apply gating decision (BLOCK / APPROVE-WITH-CONDITIONS / APPROVE), and per-finding suggested fixes with line numbers from the plan.
The agent normalizes input from any of:
terraform plan output (most common)terraform show -json plan.out (preferred — unambiguous)run-all plan outputtofu plan (treated identically to terraform)JSON form is preferred. When only text is available, the agent extracts:
+ create = N resources
~ update = N resources
-/+ replace = N resources (destroy then create)
+/- replace = N resources (create then destroy — usually safer)
- destroy = N resources
<= read = N data sources
The agent also captures resource addresses (module.x.aws_iam_role.y), change reasons (# forces replacement), and any drift markers (# (config refers to values not yet known), Note: Objects have changed outside of Terraform).
Every change in the plan gets exactly one severity. The matrix is the contract:
| Severity | Meaning | CI Default |
|---|---|---|
| P0 — Block | Apply will cause data loss, outage, security breach, or unauthorized access. | Hard block. Human override required. |
| P1 — Require Approval | High risk; reasonable in some contexts but needs second pair of eyes. | Require named reviewer + comment justification. |
| P2 — Advisory | Low risk; flagged for awareness. | Allow apply; record in review. |
| OK | Standard, low-risk change. | No flag. |
The agent runs every change through a layered ruleset. A single change may fire multiple rules; the highest severity wins.
D — Destructive change rules:
D1. destroy on aws_db_instance, aws_rds_cluster, gcp_sql_database_instance,
azurerm_mssql_database, azurerm_postgresql_server → P0
D2. destroy on aws_s3_bucket, gcs_bucket, azurerm_storage_account
when not empty (force_destroy=false) → P0
D3. destroy on stateful workloads (statefulset, persistent volume,
elasticache, redis, kafka, mq broker) → P0
D4. -/+ replace on a database, cache, queue, or persistent disk → P0
D5. -/+ replace on a load balancer with active traffic → P1
D6. -/+ replace on an IAM role / service account → P1
(breaks every principal that assumes it)
D7. destroy on a route53 / cloud DNS zone → P0
D8. destroy on a KMS key, cmek, key vault key → P0
(data encrypted with it becomes unrecoverable)
D9. destroy + recreate on a network resource carrying production traffic
(vpc, subnet, vpc peering, transit gateway, vpn) → P0
D10. lifecycle.prevent_destroy=true bypassed by a destroy → P0
S — Secret / output leak rules:
S1. output without `sensitive = true` containing a value matching
password|secret|token|key|credential|api[_-]?key → P1
S2. resource attribute set from a hardcoded string matching the
secret regex (in plan diff, not just var) → P0
S3. user_data / cloud-init / startup script contains a credential
string visible in plan diff → P0
S4. tfvars/state file referenced from a public S3 bucket / public
storage container / public repo → P0
S5. KMS / KeyVault key policy widens decrypt to "*" → P0
I — IAM widening rules:
I1. Action: "*" or Resource: "*" on a new policy statement → P0
I2. Wildcard service principal in trust policy
(Principal: {"AWS": "*"} or sts:AssumeRoleWithSAML to *) → P0
I3. iam:PassRole with Resource:"*" and no condition on iam:PassedToService→ P0
I4. s3:* / dynamodb:* / kms:* added to a previously scoped policy → P1
I5. AdministratorAccess / Owner / Editor role assignment to a user
that previously had a narrower role → P1
I6. GCP roles/iam.serviceAccountUser added without resource restriction → P1
I7. Azure RoleAssignment scope = subscription or management-group → P1
I8. Conditions removed from an existing IAM policy (less restriction) → P1
I9. MFA condition removed from an assume-role trust policy → P0
N — Public exposure rules:
N1. S3 bucket public-access-block disabled OR ACL set to public-read → P0
N2. RDS / Cloud SQL / Azure SQL publicly_accessible = true → P0
N3. Security group ingress 0.0.0.0/0 on port not in {80, 443} → P0
N4. GCS bucket IAM grants `allUsers` or `allAuthenticatedUsers` → P0
N5. Azure Storage account public_network_access = "Enabled" → P1
N6. ELB / ALB scheme = internet-facing on a service that previously was
internal → P1
N7. Cloud Functions / Lambda invocation policy adds Principal: "*" → P0
N8. GKE / EKS / AKS cluster endpoint becomes public → P0
N9. VPC subnet route added to internet gateway with 0.0.0.0/0 → P1
X — Drift & state rules:
X1. Plan shows changes you didn't intend (drift) — values "changed
outside of Terraform" → P1
X2. Resource referenced in code but absent from state (needs import) → P1
X3. Resource in state with no matching code (needs `terraform state rm`
or restore to code) → P1
X4. Provider version pinning loosened (~> to >=) → P1
X5. Backend config changed (state moves region/bucket/account) → P0
X6. count or for_each shifts re-index a list, causing mass replacement → P0
P — Provider-specific footguns (AWS):
PA1. aws_launch_configuration deprecated → P2 (use launch_template)
PA2. aws_security_group rule moved into inline vs aws_security_group_rule
resource without explicit replace = silent rule loss → P1
PA3. aws_lb_target_group port change forces recreate → P1
PA4. aws_eks_cluster version downgrade attempted → P0
PA5. aws_db_instance allocated_storage decreased → P0
PA6. aws_iam_role assume_role_policy replaced (vs updated) → P1
PA7. aws_acm_certificate replaced when an LB still references the old → P0
P — Provider-specific footguns (GCP):
PG1. google_sql_database_instance deletion_protection=false → P1
PG2. google_compute_instance metadata_startup_script changed (replaces) → P1
PG3. google_project_iam_member vs _binding mixed (binding wipes others) → P0
PG4. google_kms_crypto_key with skip_initial_version_creation=true and
destroy_scheduled_duration<7d → P0
PG5. google_container_cluster master_auth changes force in-place → P1
P — Provider-specific footguns (Azure):
PZ1. azurerm_resource_group destroy (cascades all child resources) → P0
PZ2. azurerm_key_vault soft_delete_enabled toggled off → P0
PZ3. azurerm_storage_account replication type change forces destroy → P0
PZ4. azurerm_role_assignment with scope at subscription level → P1
PZ5. azurerm_virtual_machine deprecated (use _linux/_windows variant) → P2
The agent renders a single decision based on the highest-severity finding:
ANY P0 → BLOCK (do not apply)
ANY P1, no P0 → APPROVE_WITH_CONDITIONS (named reviewer + justification)
P2 only → APPROVE (with advisory comments)
no findings → APPROVE (clean plan)
Conditions on APPROVE_WITH_CONDITIONS include: a specific named reviewer, a deploy window restriction (no Friday afternoons), a maintenance-window flag, paired apply (two engineers on call), or a pre-apply backup snapshot.
Drift is the silent killer of IaC. The agent surfaces three categories:
| Category | Marker | Action |
|---|---|---|
| Config drift | "changed outside of Terraform" | Flag P1; suggest terraform refresh then re-plan; or import the manual change as code |
| State drift | resource missing from state but in code | Flag P1; provide terraform import command |
| Code drift | resource in state, removed from code | Flag P1; suggest terraform state rm if intentional, otherwise restore the resource block |
The agent generates the import command with the address and the cloud-side resource id whenever the cloud-side id can be inferred from the plan.
The output mimics a senior reviewer's PR comment, formatted for direct paste into GitHub / GitLab / Bitbucket:
## Terraform Plan Review
**Decision:** BLOCK
**Findings:** 2 × P0, 1 × P1, 3 × P2
---
### P0 — Destroying production RDS
`aws_db_instance.payments_prod` is being destroyed. This is a
`stateful + production` resource. Apply will lose all data not
captured in a manual snapshot.
```diff
-resource "aws_db_instance" "payments_prod" {
- identifier = "payments-prod"
- ...
-}
Fix options
lifecycle { prevent_destroy = true } and re-plan; investigate
why the resource block was removed from code.aws rds describe-db-snapshots --db-instance-identifier payments-prod.terraform state mv instead of
destroy + create.aws_iam_policy.deploy adds "Action": "*" on "Resource": "*".
This grants the role full account access.
+ Action = "*"
+ Resource = "*"
Fix: scope to the specific actions the deploy job actually needs.
Output db_password lacks sensitive = true. Plan will print the
value to logs.
Fix:
output "db_password" {
value = aws_db_instance.x.password
sensitive = true
}
hashicorp/aws constraint moved from ~> 5.40 to >= 5.0. Future
plans may pull a major version. Pin tightly.
Apply is blocked until the two P0 findings are addressed. Recommended sequencing:
aws_db_instance.payments_prod block, or land a
decommission RFC with snapshot evidence.
### Step 7: CI Integration
The agent emits a ready-to-commit GitHub Action / GitLab CI job:
```yaml
# .github/workflows/terraform-plan-review.yml
name: Terraform Plan Review
on:
pull_request:
paths: ['terraform/**', '**.tf', '**.tfvars']
jobs:
plan-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with: { terraform_version: 1.7.5 }
- run: terraform -chdir=terraform init -backend=false
- id: plan
run: |
terraform -chdir=terraform plan -out=plan.out -no-color
terraform -chdir=terraform show -json plan.out > plan.json
- id: review
uses: ./.github/actions/tf-plan-review
with:
plan-json: terraform/plan.json
gating: strict # block | strict | advisory
- if: steps.review.outputs.decision == 'BLOCK'
run: |
echo "::error::Plan review blocked the apply"
exit 1
- uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: require('fs').readFileSync('review.md','utf8')
});
For Atlantis, Spacelift, env0, and Terraform Cloud — the agent emits the equivalent webhook / pre-apply hook config.
The agent reads each plan with cloud-aware eyes:
AWS lens:
GCP lens:
google_project_iam_binding overwrites all members on a role — P0 if mixed with _member resources for the same roleSTATE_PENDING_DELETE cannot be recovered after destroy_scheduled_duration — minimum 7 daysdeletion_protection should be true in prod; a plan toggling it off is P1Azure lens:
soft_delete_enabled and purge_protection_enabled interact; turning off either is P0account_replication_type change forces destroy + createFor any plan with P1+ findings, the agent emits a pre-apply checklist:
[ ] State backed up (s3 cp / gsutil cp / az storage blob copy)
[ ] Manual DB snapshot taken for any stateful resource being touched
[ ] Maintenance window confirmed with stakeholders
[ ] On-call paged or notified
[ ] Rollback plan documented in PR description
[ ] Apply executed by named engineer (not bot)
[ ] Post-apply: drift re-checked within 30 min
[ ] Post-apply: smoke tests run against affected services
For P0 plans the checklist is moot — the gating is BLOCK and the plan is rejected.
The agent supports four output modes for different integration points:
| Mode | Format | Where It Plugs In |
|---|---|---|
| review-comment | Markdown PR comment with diffs + fixes | GitHub/GitLab PR |
| gating-decision | JSON {decision, p0:[], p1:[], p2:[]} | CI scripts |
| slack-summary | One-screen Slack post with deep links | Deploy channel |
| runbook | Markdown checklist + commands | Pre-apply prep |
Input (excerpt):
# aws_db_instance.payments_prod will be destroyed
- resource "aws_db_instance" "payments_prod" {
- identifier = "payments-prod"
- engine = "postgres"
- ...
- }
# aws_iam_role_policy.deploy will be updated in-place
~ resource "aws_iam_role_policy" "deploy" {
~ policy = jsonencode(
~ {
~ Statement = [
~ {
~ Action = "*"
~ Resource = "*"
~ },
~ ]
~ },
~ )
}
# aws_db_instance.payments_prod will be created
+ resource "aws_db_instance" "payments_prod" {
+ identifier = "payments-prod"
+ ...
+ }
Wait — same identifier, destroy then create? That's a -/+ masquerading as separate destroy + create blocks because the resource address moved between modules. Rule D4 fires (P0).
Decision: BLOCK P0: D4 (DB recreate), I1 (IAM wildcard)
Input:
Note: Objects have changed outside of Terraform
# aws_security_group.web has changed
~ resource "aws_security_group" "web" {
id = "sg-0a1b2c3d"
~ ingress = [
+ {
+ cidr_blocks = ["192.168.10.0/24"]
+ from_port = 22
+ to_port = 22
+ protocol = "tcp"
},
...
]
}
# aws_security_group_rule.ssh will be created
+ resource "aws_security_group_rule" "ssh" {
+ from_port = 22
+ to_port = 22
+ protocol = "tcp"
+ cidr_blocks = ["10.0.0.0/8"]
+ }
Findings:
Suggested fixes:
terraform import aws_security_group_rule.adhoc_ssh sg-0a1b2c3d_ingress_tcp_22_22_192.168.10.0/24Decision: APPROVE_WITH_CONDITIONS — apply will succeed but state will remain inconsistent until the import is done.
Input (excerpt):
# aws_lambda_function.x will be updated in-place
~ environment {
~ variables = {
~ LOG_LEVEL = "info" -> "debug"
}
}
Decision: APPROVE Findings: none Comment: "Plan is clean. Single in-place env var update on a Lambda. No P-level findings. Apply approved."
The agent produces:
The agent processes each module's plan independently, then aggregates: per-module decision plus a cross-module dependency check (e.g. module A destroys a KMS key that module B's plan still references). Cross-module references that fail produce an additional P0.
Drift outside Terraform. The agent runs through terraform plan -refresh-only mentally and flags X1 across the board. Fix: refresh-only apply, then audit which fields were touched and bring them under code or back to declared state.
Depends on findings. Clean P2-only plans: yes. P1 plans involving load balancers, DNS, or stateful resources: schedule for low-traffic windows. P0: never (and gate blocks anyway).
The agent's text parser handles standard plan output. JSON gives 100% accuracy on resource addresses and change reasons; text gives ~95% — the gap is mostly in nested module addresses and dynamic blocks.
Atlantis enforces who can apply, not whether the plan is safe. This skill is the safety review; Atlantis is the policy enforcement. They stack.
terraform show -json plan.out) when possible — disambiguates module addresses and change reasonsplan — use tflint / tfsec / checkov for static analysis; this skill needs the rendered plan to grade destructiveness+ create; the destruction matrix doesn't apply, run a separate bootstrap-review skill