Install
openclaw skills install bookforge-security-change-rollout-planningPlan and execute a security change rollout across a service or fleet: classify the change into a time horizon (short / medium / long-term), triage affected systems by risk tier, select the appropriate rollout strategy with canarying and staged deployment, define communication strategy (internal and external), set rollback and success criteria, and produce a written rollout plan. Use when you need to respond to a zero-day vulnerability, roll out a security posture improvement, or drive an ecosystem or regulatory compliance change. Handles timeline disruption scenarios: accelerate when an exploit goes public, slow down when patch instability is detected, delay when embargo, external dependency, or limited blast radius dictates caution. Produces a rollout plan with timeline, per-tier risk triage, communication strategy, and explicit rollback criteria. Examples covered: Shellshock emergency patch, hardware security key (FIDO/WebAuthn) company-wide deployment, and Chrome HTTPS migration.
openclaw skills install bookforge-security-change-rollout-planningUse this skill when you have a security change to roll out and need a plan — not just a patch, but a structured rollout with appropriate speed, staged deployment, communication, and contingency handling.
The core decision the skill drives is: how fast, and how? Security changes impose a fundamental tradeoff: speed reduces exposure time but increases the risk of rolling out a broken patch that causes downtime or data loss. The right answer depends on whether the change is short-term (a zero-day requiring emergency response in hours), medium-term (a posture improvement that can proceed in weeks), or long-term (an ecosystem migration spanning months to years).
Invoke this skill when:
Do not use this skill to design what the fix should be (that is threat modeling) or to manage an active exploit in real time (that is incident response).
Before producing the rollout plan, gather the following:
If a system inventory is not available, ask the engineer to describe: the three highest-risk systems that must be patched first, the number of systems in scope, and whether nonstandard or inherited infrastructure exists.
WHY: The time horizon determines everything that follows — the planning depth, rollout speed, communication strategy, and success criteria. A plan designed for a week-long rollout is wrong for a same-day emergency. A plan designed for a same-day emergency applied to a years-long ecosystem migration wastes time and breaks systems unnecessarily.
Classify the change using this taxonomy:
| Horizon | Trigger type | Target timeline | Primary risk |
|---|---|---|---|
| Short-term | Zero-day vulnerability, active exploit, critical CVE | Hours to days | Moving too slowly; exploit window grows |
| Medium-term | Security posture improvement, new control, proactive hardening | Weeks to months | Insufficient stakeholder buy-in; long tail of unpatched systems |
| Long-term | Ecosystem change, regulatory requirement, protocol migration | Months to years | Loss of momentum; leadership support eroding; documentation rot |
Classification inputs:
Zero-day response almost always classifies as short-term. Before prioritizing same-day zero-day response, confirm that the "top hits" — the highest-impact vulnerabilities from recent years — are already patched. A zero-day response against a system that is unpatched for known critical vulnerabilities is the wrong priority order.
WHY: Not all systems carry the same risk from a given vulnerability. Treating a low-risk internal analytics server the same as a publicly exposed TLS endpoint wastes capacity and may slow the critical path. Triage directs limited remediation capacity to the highest-leverage systems first.
For each system or system group in scope, assess:
Assign each system to a tier:
| Tier | Definition | Rollout approach |
|---|---|---|
| Tier 1 — Critical | Externally exposed, high blast radius, actively exploitable | Patch first; use accelerated rollout; accept faster-than-normal validation |
| Tier 2 — High | Internally exposed or high blast radius but not directly reachable | Patch in next wave; standard validation |
| Tier 3 — Standard | Low blast radius, standardized infrastructure | Patch via normal automated rollout |
| Tier 4 — Nonstandard | Inherited, heterogeneous, or vendor-dependent systems | Manual intervention; send per-team notifications with explicit action items |
Shellshock pattern: at Google, Tier 1 production servers (standard distribution, easy automated rollout) were patched first and fastest. Tier 2 workstations were easy to patch quickly but required different tooling. Tier 4 nonstandard servers required manual notifications with explicit follow-up actions per team.
WHY: Rolling a change out all at once maximizes exposure risk if the patch is broken or has unexpected interaction effects. A gradual rollout — with instrumentation for canarying — lets you detect problems before they affect your entire infrastructure. This principle holds even on accelerated timelines.
For each change, define rollout phases with explicit gates between them:
Standard phased rollout structure:
Per-phase definition:
Change design properties that make every phase safer (implement these before rollout, not during):
WHY: Advancing through phases without explicit criteria leads to either excessive caution (delayed rollout, extended exposure) or insufficient caution (rolling a broken patch to the full fleet). Pre-defined criteria remove the need for judgment calls under pressure.
Success criteria (must be satisfied before advancing to next phase):
Rollback criteria (trigger immediate rollback):
Rollback mechanism: Define how rollback works before the rollout begins. For binary patches: roll back to the previous version via the same release pipeline. For configuration changes: revert the config and push. For user enrollment changes: define the off-ramp (e.g., allow users to use the legacy path temporarily). If rollback requires significant manual effort, flag this as a risk before the rollout begins.
WHY: A technically sound rollout can fail due to communication gaps. Internal teams that are not informed prepare no coverage and escalate incorrectly. Externally, users and partners who are surprised by behavior changes lose trust. Early and clear communication also prevents PR problems from becoming the bottleneck during a live vulnerability response.
Internal communication:
External communication:
WHY: A written plan enables handoff continuity (individuals leave or rotate), provides a reference during the rollout to avoid judgment calls under pressure, and serves as the audit trail for leadership reporting and post-mortems.
The rollout plan document should include:
Security Change Rollout Plan
=============================
Change: [name / CVE / description]
Horizon: [short / medium / long-term]
Author(s): [points of contact]
Status: [draft / approved / in-progress / complete]
## Summary
[1–3 sentences: what is changing, why, and the target completion date]
## Affected Systems
[Table: System/tier | Risk tier | Patch method | Owner | Estimated effort]
## Phase Plan
[For each phase:]
Phase N — [name]
Scope: [systems or user population]
Timeline: [start date / duration]
Success criteria: [specific measurable conditions]
Rollback trigger: [specific conditions]
Rollback mechanism: [how to execute rollback]
## Communication Plan
Internal: [who is notified, when, via what channel]
External: [who is notified, when, via what channel]
Embargo handling: [if applicable]
## Contingency Branches
Accelerate trigger: [e.g., exploit goes public]
→ Accelerate action: [reorder tiers, prioritize Tier 1 only, accept reduced validation window]
Slow-down trigger: [e.g., patch causing errors above threshold]
→ Slow-down action: [pause expansion, rollback canary, debug and reissue]
Delay trigger: [e.g., vendor patch not available, external dependency not ready]
→ Delay action: [apply mitigations, set new target date, notify stakeholders]
## Success Metrics
[How will you confirm the rollout is complete and the vulnerability is mitigated?]
Classify before planning. The time horizon is not an output of planning — it is the input. Get the horizon wrong and every downstream decision (speed, validation, communication) is calibrated incorrectly.
Patch the top hits first. Before prioritizing same-day zero-day response, confirm that known high-impact vulnerabilities from recent years are already remediated. Zero-day response against a system with unpatched known criticals is the wrong priority order.
Gradual rollout applies even on emergency timelines. Even for a zero-day, roll out in phases with canarying. The risk of deploying a broken patch — causing downtime across the fleet — can exceed the risk of the vulnerability itself during the patching window. On an accelerated timeline, "gradual" means hours rather than days, not "skip canarying."
Standardization is leverage. Rollout speed scales with how standardized your system fleet is. A single OS distribution and package format means a verified patch can be automated immediately. Heterogeneous infrastructure (different distributions, different versions, inherited systems) requires manual intervention that does not scale. Invest in standardization before the emergency.
When plans change, do not panic — but do not be surprised. Embed contingency branches in the plan from the start. Define in advance: what triggers acceleration, what triggers a slowdown, what triggers a full delay. When the trigger fires, execute the prepared branch rather than improvising.
Overcommunicate externally, especially for long-term changes. Organizations resistant to change (including security changes) respond to business-aligned messaging more than security messaging alone. Tie the change to business benefits. Use every available outreach channel. Expect a long tail of non-compliant systems and plan for it explicitly.
Architecture as rollout enabler. The ease of rolling out any security change is determined by architectural decisions made long before the change. Frequent builds and rebuilds ensure the latest patched dependencies are always available. Containers and immutable images allow patch-as-code-rollout. Microservices scope the blast radius of each change and enable independent rollout per service. Automated testing gives confidence to push faster. Build these before the emergency — they are not alternatives to a rollout plan, they are what makes any rollout plan executable.
Scenario: On the morning of September 24, 2014, Google Security learned of a publicly disclosed, remotely exploitable vulnerability in bash that trivially allowed code execution on affected systems. Exploits appeared in the wild the same day.
Horizon classification: Short-term. Actively exploited the same day. Emergency response timeline.
Triage:
Key execution decisions:
Lessons:
Scenario: One-time passwords (OTPs) were susceptible to phishing interception. Starting in 2011, Google evaluated stronger two-factor authentication methods. By 2013, hardware security keys (FIDO/WebAuthn) were selected and rollout began. Full OTP deprecation was completed in 2015.
Horizon classification: Medium-term. Proactive security posture improvement. Gradual, multi-year rollout.
Rollout strategy:
Lessons:
Scenario: HTTPS provides confidentiality and integrity guarantees critical to the web. The Chrome team drove adoption across the entire web ecosystem through a combination of incentives, warnings, and coordination with standards bodies, certificate authorities, and other browser vendors.
Horizon classification: Long-term. External ecosystem change. Required new systems to be built. Target timeline: years.
Rollout strategy:
Result: Chrome users' time on HTTPS sites rose from approximately 70% (Windows) and 37% (Android) to over 90% on both platforms.
Lessons:
Cross-references:
adversary-profiling-and-threat-modeling — determine what you are defending against before designing the changeresilience-and-blast-radius-design — design system architecture to reduce the blast radius of any given change or compromiseThis skill is licensed under CC-BY-SA-4.0. Source: BookForge — Building Secure and Reliable Systems by Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, Adam Stubblefield.
This skill is standalone. Browse more BookForge skills: bookforge-skills