Install
openclaw skills install aws-service-chaos-researchUse when the user asks about chaos engineering, fault injection, resilience testing, or HA verification for a SPECIFIC AWS service (e.g., RDS, EKS, MSK, ElastiCache, DynamoDB, S3, Lambda, OpenSearch, etc.). Triggers on "chaos testing on [service]", "fault injection for [service]", "how to test HA of [service]", "FIS scenarios/actions for [service]", "[service] failover testing", "[service] resilience testing", "[service] 混沌测试", "[service] 故障注入", "[service] 高可用验证", "对 [service] 做混沌实验", "test my [service]", "verify my [service] is resilient". Use this skill even when the user phrases it casually like "test my RDS" or "how resilient is my MSK cluster".
openclaw skills install aws-service-chaos-researchGenerate comprehensive chaos engineering and high availability testing scenarios for a
specific AWS service. Uses a Scenario-Library-first approach: read the latest FIS
Scenario Library documentation for pre-built composite scenarios first, then query
individual FIS actions via list-actions, and finally supplement with deep documentation
research.
Detect the language of the user's conversation and use the same language for all output.
Required tools (at least one of each group):
FIS Scenario Library (Group A — documentation-based, always available):
aws___read_documentation — read FIS Scenario Library pages directly (scenarios are
console-only and cannot be queried via CLI, so reading the latest docs is the only way
to discover them)FIS Actions Discovery (Group B — use in order of preference):
aws fis list-actions — definitive, real-time list of FIS actions from user's regionDocumentation Research (Group C):
aws___search_documentation — search AWS official docsaws___read_documentation — read full doc pagesaws___recommend — discover related pagesAll documentation research uses only the AWS Knowledge MCP tools above. Do NOT use SearXNG or other web search tools for documentation research.
CRITICAL — Sequential execution of all AWS Knowledge MCP calls:
All calls to aws___search_documentation, aws___read_documentation, and
aws___recommend MUST be executed one at a time, sequentially. NEVER send
multiple MCP requests in parallel — the aws-knowledge-mcp-server has strict rate
limits and will reject concurrent requests with "Too many requests" errors.
Wait for each request to return a complete response before sending the next one.
This applies to ALL steps below (Step 2, 4b, 4c, 5a, 5b).
Retry on failure: If any MCP call (especially aws___read_documentation) returns
a rate limit error ("Too many requests") or any other transient error, retry up to
10 times with a 5-second wait between retries. Only skip the request after all 10
retries have failed.
Multi-service requests: When the user asks about multiple services (e.g., "EKS, RDS, MSK, and ElastiCache"), process them one service at a time. Complete all research steps (Steps 2-5) for one service before starting the next. Do NOT launch parallel research for multiple services — this will trigger rate limiting. The Scenario Library fetch (Step 2) only needs to run once since it covers all services; the per-service steps (3-5) must be repeated sequentially for each service.
Extract the target AWS service from the user's message and determine the target region.
FIS actions can differ across AWS regions — some actions may be available in
us-east-1 but not yet in ap-southeast-1. Always determine the target region first,
because service keyword resolution depends on it.
Detection order (use the first one that applies):
aws configure get region to get the configured defaultStore the resolved region as TARGET_REGION for use in subsequent steps.
FIS action IDs follow the pattern aws:<service>:<action>. To map the user's input
to the correct FIS service keyword, use dynamic discovery from the live FIS action list:
aws fis list-actions --region TARGET_REGION | jq '.actions[].id' | awk -F':' '{print $2}' | sort -u
This returns the definitive list of FIS-supported service keywords in that region
(e.g., ebs, ec2, ecs, eks, elasticache, fis, network, rds, s3, ssm...).
Match the user's service name against this list. For example, if the user says
"Aurora", match it to rds; if "Kubernetes", match to eks.
If the AWS CLI is not available, derive the keyword by lowercasing the AWS service name
and removing spaces/hyphens (e.g., "ElastiCache" -> elasticache).
If the service is ambiguous, ask the user to clarify (e.g., "RDS MySQL or Aurora MySQL?").
Also determine the deployment architecture if the user mentions it:
This step has the highest priority. The FIS Scenario Library provides AWS-curated composite scenarios that orchestrate multiple fault injection actions into realistic failure simulations. These are the most valuable starting point because they represent AWS's own recommendations for how to test resilience.
Scenario Library scenarios are console-only — they cannot be listed or queried via AWS CLI or API. The only way to discover them is by reading the latest documentation.
Fetch the Scenario Library pages listed in references/search-queries.md under
"FIS Scenario Library Pages (Always Fetch)". Read both the overview and detailed scenario
pages relevant to the target service. Read pages one at a time, sequentially —
wait for each aws___read_documentation call to complete before starting the next one.
After reading the documentation, classify each scenario's relevance:
| Relevance | Criteria |
|---|---|
| Directly relevant | Scenario includes sub-actions that explicitly target the service (e.g., "Failover RDS" in AZ Power Interruption) |
| Indirectly relevant | Scenario affects infrastructure the service depends on (e.g., network disruption affects any VPC-based service) |
| Not relevant | Scenario has no meaningful impact on the target service |
Include both directly and indirectly relevant scenarios in the output.
After the Scenario Library research, query individual FIS actions to discover service-specific fault injection capabilities that may not be covered by composite scenarios.
Step 3a: Fetch ALL FIS actions in the target region:
aws fis list-actions --region TARGET_REGION --query 'actions[].{id:id, description:description}' --output json
Replace TARGET_REGION with the region resolved in Step 1 (e.g., us-east-1).
If no region was determined, omit --region to use the CLI default, but warn
the user that results reflect their default region and may differ in other regions.
Step 3b: Filter for target service — from the full list, find actions whose id
contains the search keyword(s) from Step 1:
aws fis list-actions --region TARGET_REGION --query 'actions[?starts_with(id, `aws:KEYWORD:`)].{id:id, description:description}' --output json
Also scan the description field for the service name, because some actions may reference a service in their description even if the action prefix is different.
Step 3c (Optional): Collect cross-cutting actions — these affect services indirectly. Include them if the user's service would benefit from network, API, or infrastructure-level fault injection testing:
aws fis list-actions --region TARGET_REGION --query 'actions[?starts_with(id, `aws:network:`) || starts_with(id, `aws:fis:inject`) || starts_with(id, `aws:ssm:`) || starts_with(id, `aws:ec2:stop`) || starts_with(id, `aws:ec2:terminate`)].{id:id, description:description}' --output json
Cross-cutting actions and when they're useful:
aws:network:disrupt-connectivity — useful for any VPC-based serviceaws:network:disrupt-vpc-endpoint — useful for services accessed via PrivateLinkaws:fis:inject-api-internal-error — useful to test app handling of AWS API failuresaws:fis:inject-api-throttle-error — useful to test backoff/retry logicaws:fis:inject-api-unavailable-error — useful to test graceful degradationaws:ec2:stop-instances / terminate-instances — useful for services running on EC2aws:ssm:send-command / start-automation-execution — useful for custom fault scriptsWhether to include cross-cutting actions depends on context:
Search the FIS actions reference documentation:
aws___search_documentation(
search_phrase="AWS FIS actions [SERVICE_NAME] fault injection",
topics=["reference_documentation"],
limit=10
)
Then read the FIS actions reference page:
aws___read_documentation(
url="https://docs.aws.amazon.com/fis/latest/userguide/fis-actions-reference.html",
max_length=10000
)
Count the number of service-specific actions found (exclude cross-cutting actions).
When FIS has native actions for the target service, combine Scenario Library findings with FIS-action-specific details.
Map each FIS action to a testing scenario. Use the "FIS Native Fault Injection
Scenarios" table format from references/output-template.md.
IMPORTANT — Scenario Library deduplication (must apply before building the table): Before listing any FIS action in the per-service table, check whether that exact action ID appeared as a sub-action in any Scenario Library composite scenario discovered in Step 2. Common examples of overlap:
aws:rds:failover-db-cluster — sub-action of AZ Power Interruptionaws:elasticache:replicationgroup-interrupt-az-power — sub-action of AZ Power Interruptionaws:eks:pod-network-latency — sub-action of AZ Application Slowdownaws:eks:pod-network-packet-loss — sub-action of Cross-AZ Traffic Slowdownaws:ec2:stop-instances — sub-action of AZ Power InterruptionRules:
replicationgroup-interrupt-az-power which is covered by
AZ Power Interruption), omit the "FIS Native Fault Injection Scenarios"
sub-section entirely and replace with:
All FIS native actions for {SERVICE} are covered by Scenario Library composite scenarios. See the Scenario Library and Cross-Cutting section for details.
Group scenarios by failure domain:
Scenario Library cross-reference: For each FIS action, check whether it also appears as a sub-action in any Scenario Library composite scenario discovered in Step 2. If it does, append a note in the "HA Verification Purpose" column (e.g., "Also a sub-action of AZ Power Interruption — see Scenario Library section"). If all service-specific FIS actions are sub-actions of Scenario Library scenarios, omit the "FIS Native Fault Injection Scenarios" sub-section entirely and replace it with a note: "All FIS native actions for this service are covered by Scenario Library composite scenarios — see the Scenario Library and Cross-Cutting section."
Some services have built-in fault injection beyond FIS. Search for these (sequentially — wait for the search to complete before reading any result pages):
aws___search_documentation(
search_phrase="[SERVICE_NAME] fault injection testing failover simulation",
topics=["general", "reference_documentation"],
limit=10
)
If found, add a "Service Built-in Fault Injection" section using the table format from
references/output-template.md.
Use the search queries from references/search-queries.md under "FIS-Enriched Path".
Run all 5 queries sequentially (one at a time). After searches, read the top 3-5
most relevant pages one at a time and use aws___recommend on the most relevant
page for discovery. Never send multiple read or recommend requests in parallel.
When FIS has no native actions for the target service, fall back to comprehensive documentation research. Note that Scenario Library findings from Step 2 still apply.
Use the search queries from references/search-queries.md under "Documentation-Only Path".
Run all 6 queries sequentially (one at a time, wait for each to complete).
From the combined search results, read the top 5 most relevant pages following the
priority order in references/search-queries.md. Read pages one at a time — wait
for each aws___read_documentation call to complete before the next. Then use
aws___recommend on the service's main documentation page to discover related content.
Extract from all pages:
Use the "Testing Methods (No Native FIS Actions)" section format from references/output-template.md,
including both indirect FIS actions and AWS API/Console methods.
Write the report directly to a local markdown file instead of outputting the full content to the terminal. Use the following file naming convention:
TIMESTAMP=$(TZ=Asia/Shanghai date +%Y-%m-%d-%H-%M-%S)
SERVICE_SLUG=$(echo "{SERVICE_NAME}" | tr '[:upper:]' '[:lower:]' | tr ' :/' '-')
# File name: ${TIMESTAMP}-${SERVICE_SLUG}-chaos-research.md
For multi-service requests, generate one file per service:
${TIMESTAMP}-rds-chaos-research.md${TIMESTAMP}-eks-chaos-research.mdCompile the report content using the exact format defined in references/output-template.md
and save it to the file. The report must include all sections in this order:
{SVC}-# test IDs, e.g., EKS-1, Redis-1), built-in methods, recommended testing scenario matrix, environment observations, and stop conditionsAfter saving, print a brief summary to the terminal listing only:
--region to the AWS CLI and
clearly state the region in the output.aws___search_documentation, aws___read_documentation, and aws___recommend.topics values (general,
reference_documentation, troubleshooting) sequentially.aws___recommend
to find related content that keyword search may miss.aws___search_documentation,
aws___read_documentation, and aws___recommend MUST be executed one at a time.
Wait for each response before sending the next request. Parallel calls will trigger
"Too many requests" errors from the aws-knowledge-mcp-server. This is the single
most common cause of failures — enforce strictly in every step.