Install
openclaw skills install bookforge-least-privilege-access-designAnalyze a system's access patterns and design least-privilege controls: classify data and APIs by risk, select the narrowest API surface for each operation, define authorization policies with multi-party approval for sensitive actions, establish emergency access override procedures, and optionally introduce a controlled-access production proxy. Use when reviewing access controls for an existing system, designing authorization for a new service, auditing whether engineers have more permissions than their roles require, deciding whether to use a bastion or proxy for privileged operations, or hardening administrative API surfaces against insider mistakes and external compromise. Produces an access classification report, API surface recommendations, authorization policy decisions, and emergency override guidelines.
openclaw skills install bookforge-least-privilege-access-designUse this skill when you need to systematically reduce the damage any one user, automation, or compromised credential can cause — by granting only the access needed and no more.
Invoke it for:
Do not invoke it for selecting the cryptographic authentication mechanism, designing network segmentation, or full threat modeling — those are separate concerns.
Before designing least-privilege controls, gather the following:
If a codebase is available, search for:
.authorized_keys filesWHY: Not all data and actions carry the same blast radius. Treating everything uniformly either over-controls low-risk operations (hurting productivity) or under-controls high-risk ones (accepting unnecessary exposure). A classification framework makes the trade-off explicit and consistently applied.
Classify each data store and API using the access classification matrix. For each resource, determine its sensitivity category and then assess risk by access type:
Sensitivity categories:
| Category | Definition |
|---|---|
| Public | Open to anyone in the organization; limited business impact if exposed |
| Sensitive | Limited to groups with a documented business purpose; medium impact if exposed or corrupted |
| Highly Sensitive | No permanent access; high impact if exposed, corrupted, or deleted (PII, cryptographic secrets, billing data, user credentials) |
Risk by access type (per Table 5-1, Chapter 5):
| Read access | Write access | Infrastructure access | |
|---|---|---|---|
| Public | Low risk | Low risk | High risk |
| Sensitive | Medium/high risk | Medium risk | High risk |
| Highly Sensitive | High risk | High risk | High risk |
Infrastructure access — the ability to change ACLs, reduce logging levels, gain direct shell access, restart services, or otherwise affect service availability — is high risk for all sensitivity levels. A read of publicly available data can still enable catastrophic abuse if it bypasses normal access controls.
Output of this step: a classification table listing each data store, API group, and role, with its assigned sensitivity category and the risk level per access type.
WHY: A large API surface is the root cause of most over-privilege. When users or automation connect via a broad interface (an interactive shell, a general-purpose admin API, a root-level process), the system can't distinguish what they actually need from what they could do. Narrowing the API to the minimum set of operations required makes it possible to grant the minimum permission and to audit actions precisely.
For each administrative API or access pathway, assess:
abc123" is.Use the API selection tradeoff matrix (per Table 5-2, Chapter 5 — configuration distribution example):
| API approach | API surface | Auditability | Can express least privilege | Complexity |
|---|---|---|---|---|
| POSIX API via SSH | Large | Poor | Poor | High |
| Software update / package manager API | Varies | Good | Varies | High, but reusable |
| Custom scoped command (e.g., SSH ForceCommand) | Small | Good | Good | Low |
| Custom HTTP/RPC sidecar | Small | Good | Good | Medium |
Design rule: Make each API endpoint do one thing well. When you need a new operation, build a new narrow endpoint rather than extending an existing broad one. This applies equally to user-facing APIs and administrative APIs.
For existing systems with broad APIs (e.g., SSH access to all hosts):
WHY: The appropriate authorization control depends on the risk of the action. Binary yes/no ACLs are sufficient for low-risk reads; high-risk writes on sensitive data require additional controls that distribute trust across multiple parties and create an auditable record.
Match each classified operation to one or more of the following controls:
Access control list (ACL) / group membership — appropriate for:
Multi-party authorization (multi-person approval) — appropriate for:
Business justification (structured) — appropriate for:
Temporary access — appropriate for:
Three-factor authorization — appropriate for:
For highly sensitive infrastructure operations, combine controls: multi-party authorization + temporary access + structured business justification.
WHY: Authorization controls are only as effective as the audit mechanism that detects when they are circumvented or abused. The value of a narrow API comes not just from preventing misuse, but from making every action attributable and reviewable. Without deliberate audit design, audit logs become noise that nobody reviews.
Audit log requirements:
Granularity: Small functional APIs provide the largest audit advantage. "User pushed config with hash abc123 to host group web-frontend" enables strong assertions. "User opened SSH session" does not. Interactive session transcripts (bash history, script(1)) appear comprehensive but can be bypassed by any user who is aware of their existence.
Auditor selection:
Emergency override audit: Emergency override (breakglass) events must always be reviewed. Weekly team review of all emergency override usage from the previous shift is a practical pattern — it creates cultural accountability and signals when the narrow API is insufficient for real operational needs (which should trigger a fix to the normal API, not normalization of emergency override use).
WHY: Any authorization system can fail. A bad policy update, a misconfigured ACL, or an urgent production incident may require access that the normal authorization path cannot provide in time. Without a pre-defined, tested emergency access mechanism, engineers will improvise — which introduces uncontrolled risk. With a well-designed one, you get a controlled escape valve that is tightly audited.
Define the emergency access override policy with the following properties:
Access restriction: Emergency override access should be available only to the team directly responsible for the service's operational SLA (typically the SRE team). It should not be broadly available to all engineers.
Location restriction (for zero trust network access): If the service uses zero trust network access (access based on user and device credentials, not network location), the emergency override for bypassing the zero trust control should be available only from specific, physically secured locations with additional physical access controls — sometimes called "panic rooms." This is an intentional exception to the "network location doesn't grant trust" principle, offset by physical controls.
Monitoring: All uses of emergency override must be logged and reviewed. Emergency override use should be rare and surprising. Routine use signals that the normal API is inadequate and must be fixed.
Testing: The emergency override mechanism must be tested regularly by the team responsible for the service. A mechanism that has never been tested may not work when it is needed.
Graceful failure: Design the authorization system to fail in a known, diagnosable way. When a caller is denied access, the denial message should include information proportional to the caller's privilege level — nothing for unprivileged callers (no information disclosure), remediation steps for authorized callers who are incorrectly denied. Provide a denial token that can be used to open a support ticket rather than requiring the caller to describe the failure from memory.
WHY: When fine-grained controls for backend services are not available — because the service is third-party, legacy, or too costly to modify — a controlled-access production proxy can layer authorization, auditing, rate limiting, and multi-party approval on top of the existing interface without requiring changes to the underlying system.
A controlled-access production proxy is appropriate when:
A controlled-access production proxy provides:
Proxy risks and mitigations:
Least privilege applies to humans, automation, and machines equally. The objective extends through all authentication and authorization layers. Automation credentials often accumulate permissions over time — review them with the same rigor as human roles.
Avoid ambient authority. Users and automation should not hold standing access to sensitive resources they do not currently need. Temporary access that expires is always preferable to permanent standing access.
Design for the realistic threat model, not the idealized one. Engineers make typos. Accounts get compromised. Credentials are phished. A system that requires perfect human execution to remain secure is not secure. Design to limit the damage of realistic failure modes.
Small APIs make everything else possible. Narrow, functional APIs are the prerequisite for meaningful audit logs, meaningful least privilege, and meaningful multi-party authorization. A system built on broad interactive APIs cannot be audited or constrained effectively regardless of other controls.
Authorization infrastructure should be shared, not per-service. Separate authorization logic into a shared library or service. This enables org-wide controls (multi-party authorization, multi-factor authorization) to be added at a single layer rather than requiring changes to every service. Standardization also enables team mobility and consistent policy reasoning.
Culture enforces what technology cannot. Multi-party authorization only works if approvers feel genuinely empowered to reject suspicious requests. Emergency override use only remains rare if teams review it regularly and treat frequent use as a signal that the normal API needs improvement. Controls without cultural reinforcement become rubber stamps.
Scenario: An infrastructure team runs a fleet of production servers. Engineers use command-line tools directly against the fleet for administrative tasks. These tools are potentially dangerous — an incorrect scope selector could stop multiple service frontends simultaneously. Centralized logging and authorization are not enforced.
Problem: No audit trail. Engineers can issue arbitrary commands to production. A single engineer account compromise or fat-finger mistake could take down significant fleet capacity.
Least-privilege design:
admin group must have a peer in admin-leads approve before the command executes.Result: ~13% of production outages that would have been caused by direct human access to production become preventable. The blast radius of any single mistake or compromise is bounded by the rate limiter and multi-party approval requirements.
Proxy policy structure (conceptual):
config {
proxy_role = 'admin-proxy'
tools {
restart_job {
allow = ['group:admin']
require_approval_from = ['group:admin-leads']
rate_limit = { max_per_minute = 5 }
}
status_check {
allow = ['group:admin', 'group:sre']
# No approval required — read-only, low risk
}
}
}
Scenario: An automation system needs to push a validated configuration file to all web servers in a fleet. The naive approach: SSH to each host as the user the web server runs as, write the file, restart the process.
Problem: The SSH approach exposes the entire POSIX API. The automation role can read any data on the host, stop the web server permanently, start arbitrary binaries, or cause a coordinated outage of the entire fleet. A compromise of the automation credential is equivalent to a compromise of every web server.
Least-privilege design using Table 5-2 logic:
Result: A compromise of the push automation credential cannot write arbitrary content to hosts or run arbitrary processes. The blast radius is limited to pushing a valid (signed) config — which itself requires compromise of the signing system.
Scenario: Customer support representatives need to access customer account records to resolve tickets. Currently, all support staff have read access to all customer records for all customers at all times.
Problem: Overly broad read access to highly sensitive data. A support staff compromise, or a malicious insider, can exfiltrate all customer data without any specific trigger.
Least-privilege design:
Result: The data surface exposed to any single support interaction is the minimum needed to resolve that case. A compromised support account can only access data for currently open tickets assigned to it — not the entire customer database.
Cross-references:
adversary-profiling-and-threat-modeling — identify which adversaries and attack paths make least-privilege controls most valuableThis skill is licensed under CC-BY-SA-4.0. Source: BookForge — Building Secure and Reliable Systems by Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, Adam Stubblefield.
This skill is standalone. Browse more BookForge skills: bookforge-skills