Install
openclaw skills install agency-osManage and mutate Notion-based agency task and corpus data via slash commands using medium-reasoning Sonnet subagent for accurate dispatch board updates.
openclaw skills install agency-osNotion-as-source-of-truth dispatch board. One Tasks database, one Hub page, one page per Corpus, one page each for General Guidance and Resources. The skill mutates Notion via the Notion MCP (mcp__*__notion-* tools); only references/notion-pointers.json is committed to git.
Skill name decision: the skill is named agency-os (matching the repo). All commands are /agency-os <cmd>. This is the single plugin entry point; there is no agency-os/notion sub-namespace. If you embed this plugin alongside others, prefix commands with agency-os to avoid collisions.
This SKILL.md is the authoritative spec and is harness-agnostic. The slash-command interface (/agency-os ...), the status flow, the schema, the workspace layout, and every command's behavior are the same everywhere.
Per-harness wrappers only differ in two things:
/agency-os ... (or the natural-language equivalent) into chat. Either way, the parser is the same.See docs/harnesses/ for per-harness setup. If you're reading this in a non-Claude-Code harness, treat the subagent dispatch instructions as optional: execute commands inline instead.
Every /agency-os <command> invocation runs on Sonnet at medium reasoning effort via a subagent, not on the orchestrator's model. The work this skill does — resolve an ID, mutate a Notion row via MCP, format a brief — needs enough judgment (dedup checks, brief assembly, dependency reasoning) that Haiku slipped on edge cases; Sonnet at medium effort is the right balance of accuracy and cost. The orchestrator stays free for the conversation around the command.
When the user invokes /agency-os <command> <args> (or the orchestrator translates a natural-language request into one — see "Natural-language driving" below), the orchestrator's only job is to dispatch:
Agent({
description: "Run /agency-os <command>",
subagent_type: "general-purpose",
model: "sonnet",
prompt: "Run the agency-os skill for: /agency-os <command> <args>.\n\nUse medium reasoning effort — think through ID resolution, dedup checks, and brief assembly carefully, but don't over-deliberate on mechanical mutations.\n\nRead .claude/skills/agency-os/SKILL.md and execute that exact command end-to-end: sync preflight (call notion-fetch live — never read from any local cache file), resolve IDs against the live Notion result, mutate Notion via the Notion MCP, and return the same output format the skill specifies (the brief for `start`, the `+ Suggestion: ... -> url` line for `suggest`, etc.). All task/result links must be formatted as CommonMark markdown links — `[title](url)` — never HTML `<a>` tags and never a bare URL. Claude Code renders markdown but not HTML, so HTML anchors show up as literal text. If the command is `start`, also emit the full kickoff brief verbatim. If anything fails (sync, MCP call, ID resolution), stop and report — do not guess.\n\nYOU MUST ALWAYS PRODUCE OUTPUT. Never return silently. On success: the skill's standard output. On failure: one paragraph describing exactly what failed, what was attempted, and what state Notion was left in. Returning nothing is not an option."
})
The orchestrator MUST always relay the subagent's result to the user — no exceptions.
subagent returned no output; execution status unknown. Check Notion directly.Do not re-run any step yourself, do not "double-check" the subagent's work.
Natural-language driving stays on the orchestrator. Parsing "let's discuss the X task" into /agency-os discuss <id>, asking the user clarifying questions during a discussion, deciding when to call log vs add-subtask — that conversation runs on the orchestrator so thread context survives across turns. Only the discrete mutation (each log, each add-subtask, each approve) dispatches to the Sonnet subagent.
Rule of thumb: anything that touches Notion via the MCP -> Sonnet subagent. Anything that's deciding what to touch -> orchestrator.
On Cursor, Cline, Continue, and generic MCP harnesses, the skill can't spawn subagents. Instead:
/agency-os init to store your preferred models in .claude/skills/agency-os/config.json./agency-os run): read config.json and use the stored models' constraints when evaluating task complexity.The config file has this shape:
{
"harness": "cursor",
"models": {
"haiku": "claude-haiku-4-5-20251001",
"sonnet": "claude-sonnet-4-6",
"opus": "claude-opus-4-7"
}
}
The /agency-os run command still uses the same picker heuristic (Haiku for mechanical work, Sonnet default, Opus for strategic) — but on non-Claude harnesses, you need to ensure your harness has an API or SDK integration with those models (e.g., via the Anthropic SDK in a Continue custom tool). If your harness only supports one model, set all three to that model and the skill will use it for everything.
This skill is the one place where the project's local-vs-Notion split is enforced. Get this wrong and you either spam MCP tokens reading what should be a file, or you put mutable state somewhere git can't roll it back.
.claude/skills/, every doc under docs/, the notion-pointers.json binding, and references/general-guidance.md (the canonical General Guidance text — Notion's page is a one-way mirror). Agents read these directly off disk — fast, free, greppable, diffable.git log.references/general-guidance.md -> Notion General Guidance page is the only sync direction. Edit the local file, then push to Notion (via scaffold or a manual update). Never edit the Notion page and try to reverse-sync it; drift will follow.When this skill assembles a brief for an agent, it pulls task state from Notion (the row, the discussion log, subtask titles) and stable spec from local files (general guidance, corpus local guidance if those are mirrored, links into docs/ and .claude/skills/). The brief contains pointers to local files, not copies of them, so the agent can grep further when it needs to.
# setup
/agency-os scaffold # idempotent: build / verify the workspace
/agency-os init [--harness claude-code|cursor|cline|continue|generic-mcp] [--haiku=<model>] [--sonnet=<model>] [--opus=<model>]
# configure model selection (non-Claude harnesses only; Claude Code ignores this)
/agency-os sync # preflight: verify live Notion connection and pointer IDs
# suggestions
/agency-os suggest "<title>" [--corpus=<s>] [--type one-time|recurring]
[--cadence daily|weekly|biweekly|monthly|quarterly|yearly]
[--notes "..."] [--effort S|M|L|XL]
# clarification & subtasks
/agency-os discuss <id> # Suggestion -> Discussion + load context for clarification
/agency-os log <id> "<entry>" # append a discussion entry
/agency-os add-subtask <parent-id> "<title>" [--effort=<e>] [--notes "..."] [--deps=<id1>,<id2>,...]
/agency-os approve <id> # Discussion -> To-Do; cascades active subtasks
# execution
/agency-os start <id> # To-Do -> In Progress + emit kickoff brief
/agency-os refresh # auto-enumerate Status=To-Do AND Exec=Agent via Notion REST API, write state/todo-ids.json
/agency-os run [--go] # batch-execute the To-Do sidecar; orchestrator picks model per task
/agency-os done <id> [--result-link <url>] [--note "..."]
/agency-os kill <id> [--reason "..."]
# read
/agency-os next [N] [--corpus=<s>]
/agency-os status
/agency-os list <suggestion|discussion|todo|inprogress|done|killed|recurring|all> [--corpus=<s>]
/agency-os show <id> [--section description|discussion|donelog|all] [--entry <date>]
# escape hatches
/agency-os update <id> [--title="..."] [--notes="..."] [--priority=1|2|3|4]
[--effort=S|M|L|XL]
[--type=one-time|recurring] [--cadence=...] [--corpus=<s>]
[--deps=<id1>,<id2>,...|none]
/agency-os move <id> --to <status> # force any status transition
/agency-os add-corpus "<name>" [--goal "..."]
Natural language is also a trigger. When the user says "add a suggestion to ...", "let's discuss the X task", "log: agreed to ...", "add a subtask: ...", "approve that", "make weekly task recurring", "mark X done with link <url>", "kill the Y idea" — the skill mutates Notion the same way it would for the slash command equivalent. The slash commands are the canonical interface; natural language is a convenience layer.
Suggestion --discuss--> Discussion --approve--> To-Do --start--> In Progress --done--> Done
^
any --kill--> Killed (terminal) |
|
Recurring tasks: done logs an occurrence and |
loops back to To-Do with Last Done updated. ---+
| Status | Meaning | Set by |
|---|---|---|
Suggestion | Idea in the inbox; not yet discussed | suggest, manual Notion add |
Discussion | Under clarification; subtasks may be emerging; not yet approved | discuss |
To-Do | Approved scope; scheduled to execute | approve, recurring loop on done |
In Progress | An agent is actively working it; brief has been emitted | start |
Done | Closed (one-time only) | done (when Type=one-time) |
Killed | Intentionally dropped | kill |
The picker filters on Status == "To-Do", prioritising 1 then 2 then 3 then 4 then unset, and within that by Created ascending. The launcher requires Status == "To-Do". The suggestor refuses dupes against any non-terminal status by Title-Jaccard >= 0.8.
If start crashes, the row sits at In Progress. Manual recovery: /agency-os move <id> --to todo or flip Status in Notion directly.
Before every command, call notion-fetch <tasks_data_source_id> (from notion-pointers.json) to get live data from Notion. Resolve <id-or-substring> against the live result. Never read from notion-cache.json or any local snapshot. Mutations also target Notion directly.
If notion-fetch fails (Notion API down, OAuth expired), print sync failed: <reason> and abort — do not fall back to any cached file.
Hub (page)
+-- intro: what this board is for
+-- General Guidance -> page
+-- General Plan -> table; one row per Corpus, linked to its page
+-- Suggestions Inbox (linked DB view: Status=Suggestion, sort Created desc)
+-- In Discussion (linked DB view: Status=Discussion, sort Created desc)
+-- To-Do (Scheduled) (linked DB view: Status=To-Do, sort Priority asc, group by Corpus)
+-- Recurring (linked DB view: Type=recurring, sort Last Done asc)
+-- In Progress (linked DB view: Status=In Progress)
+-- Recently Done (linked DB view: Status=Done, sort Done At desc, limit 25)
+-- Resources -> page
Tasks (database)
+-- one row per task, of any status, including subtasks
| Property | Type | Notes |
|---|---|---|
Title | title | Imperative phrase, <=80 chars |
Status | status | Suggestion, Discussion, To-Do, In Progress, Done, Killed. Default Suggestion |
Type | select | one-time (default), recurring |
Cadence | select | daily, weekly, biweekly, monthly, quarterly, yearly. Empty for one-time |
Last Done | date | Set on done for recurring |
Corpus | select | One of the configured corpora |
Priority | select | 1, 2, 3, 4 — urgency (1 = blocks something this week / 2 = this month / 3 = this quarter / 4 = nice to have / default). Lower number = higher urgency. Default 4 |
Impact | select | low, medium, high, outsized — outcome size within the task's corpus, independent of urgency. Default medium |
Effort | select | S, M, L, XL. Default M |
Exec | select | none (default), Agent, Human. Operator-set gate: only Agent-marked To-Do rows enter the run queue |
Parent Task | relation (self) | Set on subtasks; empty for top-level tasks |
Subtasks | rollup | Auto-rolled from inverse of Parent Task; surfaces count |
Created | created_time | Automatic |
Done At | date | Set on terminal Done (one-time) |
Result Link | url | Live link, post URL, PR URL, etc. |
Tags | multi_select | Cross-cutting concerns (user-defined) |
Dependencies | relation (self) | IDs of tasks that must reach Done before this row is dispatchable. Used only by run to stage execution; ignored by start, next, list. Empty by default |
Every task page uses toggleable H2 sections so the DB grid stays clean and details fold on click. Created via Notion's is_toggleable: true heading flag (Notion API). If the MCP path can't set the flag, fall back to plain H2; the user can toggle manually.
> Launch this task
> Paste in Claude Code: `/agency-os start <id>`
Description <- starts expanded
<freeform: what to do, why, acceptance criteria, links to docs/skills>
Subtasks <- starts expanded if any subtasks exist, else hidden
(linked DB view: Parent Task = this, sort Created asc)
Discussion log <- starts collapsed
### 2026-01-10 — initial clarification
Q: ...
A: ...
Decisions:
- ...
Done log <- starts collapsed; only meaningful for recurring
### 2026-01-12: completed by agent — link <url>
Related <- starts expanded; one-line each
Corpus: [-> <corpus>](<corpus-url>)
General guidance: [-> Guidance](<guidance-url>)
Parent: [-> <parent-title>](<parent-url>) <- only for subtasks
# Corpus: <name>
## Goal
1-3 sentences on what "done" looks like for this whole corpus.
## Local guidance
Conventions, owners, references. Anything an agent needs before its first task here.
## Tasks
(linked DB view: Corpus = this, group by Status)
Project-wide rules: link into docs/, link into .claude/skills/, the launch flow, house style. Kept short. Links beat duplication. Seeded from references/general-guidance.md — edit the local file, push the mirror.
.claude/skills/agency-os/references/notion-pointers.json (committed):
{
"hub": { "page_id": "<uuid>", "url": "...", "title": "..." },
"tasks_database": {
"database_id": "<uuid>",
"data_source_id": "<uuid>",
"url": "...",
"task_id_prefix": "OS"
},
"guidance": { "page_id": "<uuid>", "url": "...", "title": "..." },
"resources": { "page_id": "<uuid>", "url": "...", "title": "..." },
"corpora": {
"General": { "page_id": "<uuid>", "url": "...", "title": "General" }
},
"hub_views": { "Suggestions Inbox": "view://<uuid>", ... },
"corpus_views": { "General": "view://<uuid>" },
"schema_summary": { ... }
}
init [--harness=...] [--haiku=...] [--sonnet=...] [--opus=...]Non-Claude harnesses only. Configure which models to use for task execution. Claude Code harnesses ignore this; they have built-in model selectors.
Why it matters: Cursor, Cline, Continue, and generic MCP harnesses can't spawn subagents with different models on the fly. Instead, store your preferred models upfront in .claude/skills/agency-os/config.json, then the skill uses them during batch execution.
Interactive mode (recommended):
/agency-os init
Prompts:
Stores in .claude/skills/agency-os/config.json and prints config: created -> <path>.
Non-interactive mode:
/agency-os init --harness cursor --haiku claude-haiku-4-5 --sonnet claude-sonnet-4-6 --opus claude-opus-4-7
If a harness doesn't support a model (e.g., Cursor is configured for only Sonnet), pass the model it does support for all three tiers; the skill will use it for everything.
If config.json already exists, re-running init overwrites it. To reset: /agency-os init interactively.
scaffoldIdempotent setup. If notion-pointers.json exists and every ID resolves via notion-fetch, prints scaffold: already in place and exits. Otherwise creates only what's missing.
database_id and data_source_id. The Dependencies property is a self-relation on Tasks (separate from Parent Task — that's the hierarchy relation; Dependencies is the gating relation used only by run).references/general-guidance.md.General, Recurring (user can customise via --corpora flag or add later with add-corpus). Seed each from references/corpus-template.md.references/resources.md if present.Task ID as the leftmost column so subtask IDs are reachable at a glance.notion-pointers.json.page_full_width, so Hub / Guidance / Resources / each Corpus page / new task pages need the ... menu -> "Full width" toggle flipped on by the operator after creation. Surface this in scaffold's final output so the operator knows to do it.Parent Task (parent) / Subtasks (children) relation. Path varies by Notion version: typically ... menu -> "Sub-items" -> pick Parent Task. Once wired, every view in the Hub gets a chevron on rows with children.add-corpus "<name>" extends the General Plan post-scaffold: appends a Corpus select option, creates the page, adds the filtered view, updates pointers. Print: + Corpus: [<name>](<url>).
suggest "<title>" ...Add a row in Suggestion status.
--corpus is in pointers (else list and refuse); if --type=recurring, --cadence is required.{Suggestion, Discussion, To-Do, In Progress}.notion-create-pages with parent = Tasks data source. Properties: Title, Status=Suggestion, Corpus, Type, Cadence (if recurring), Effort. Page body = task-page-template.md rendered.--notes provided, write into the Description section.+ Suggestion: [<title>](<url>).discuss <id>Begin clarification on a Suggestion. Flips status to Discussion and prepares the agent to ask clarifying questions.
<id> (UUID, URL, or unique Title substring against Suggestion rows).notion-update-page Status -> Discussion.Ready to clarify. Ask your questions or paste new requirements; the skill will log them with /agency-os log <id> and create subtasks with /agency-os add-subtask <id>. [Open task](<task-url>)log for each round, calls add-subtask whenever the user's responses imply concrete new work.discuss does not require status to be Suggestion — calling it on an already-Discussion row is fine and reloads the brief.
log <id> "<entry>"Append a discussion entry to the task page.
<id>.Discussion log toggle on the page. Append a new dated entry:
### <YYYY-MM-DD> — <auto-summary first 6 words of entry>
<entry>
+ Logged: [<title>](<url>).+ they're treated as proposed subtasks and surfaced in the output: note: detected N proposed subtasks; create with /agency-os add-subtask <id> "<title>".The agent should call log multiple times during a discussion — once per Q/A round, or once per cohesive thought — rather than dumping a single megalogue at the end. This keeps the log queryable by date.
add-subtask <parent-id> "<title>" ...Create a subtask row.
<parent-id>. Refuse if parent's status is Done or Killed.notion-create-pages with parent = Tasks data source. Properties: Title, Status = parent's status (Discussion or To-Do typically), Corpus = parent's corpus, Parent Task = parent, Type = one-time, Effort from flag or default, Dependencies from --deps if provided (each id resolved live via Notion; refuse if any id is unknown).### <date> — subtask added: [<title>](<url>).+ Subtask of <parent-title>: [<title>](<url>){ deps=N}.Subtasks can have their own subtasks (nesting allowed; the skill doesn't enforce a depth limit but warns at depth >= 3).
The hardest part of using this board well is deciding what shape a piece of work takes. Three rules:
1. A task is a coherent unit with a clear "done" state. It can be shipped, merged, published, decided. "Set up X integration" is a task. "Think about X" is not.
2. A subtask is a child task that can be completed independently and is bounded by a deliverable, not by a step. "Write the onboarding blurb" is a subtask because the blurb is a separable artifact with its own done. "Click submit on the form" is not a subtask — log it in the discussion or done note instead.
Rule of thumb: if you'd naturally hand it to a different agent on a different day, it's a subtask. If you'd do it inline while working the parent, it's a step.
3. A log entry is a decision, clarification, or update on existing scope. "Decided to launch Tue not Wed" is a log entry. "Operator handles the review thread" is a log entry, not a subtask.
When the user says "save this to Notion", "make this a task", "track this in Notion", "capture this conversation" mid-chat, read the conversation as a tree, not a transcript:
/agency-os suggest "<title>" --corpus=<inferred>./agency-os discuss <id>./agency-os log <id> "<distilled>"./agency-os add-subtask <id> "<title>" per item.Captured to <url> in Discussion. Approve when you're ready to schedule.approve <id>Promote a task from Discussion -> To-Do, cascading active children.
<id>. Verify status is Discussion (or Suggestion — fast-track allowed). Refuse otherwise.{Suggestion, Discussion}. For each, set Status -> To-Do.--priority provided, set on the parent only.### <date> — approved entry to the parent's Discussion log.-> To-Do: [<title>](<url>) (cascaded N subtasks).start <id> (alias: launch)Move To-Do -> In Progress and emit the kickoff brief.
Sync preflight.
Resolve <id> (UUID, URL, or substring against To-Do rows).
Verify status is To-Do. If Suggestion -> refuse with discuss it first. If Discussion -> refuse with approve it first. If In Progress -> soft-allow (re-emits brief).
notion-update-page Status -> In Progress.
Assemble the kickoff brief, in this exact order:
## Task
[<title>](<notion-url>) [<corpus> / <priority> / type=<type>{ cadence=<cadence>}{ last_done=<date>} / effort=<effort>]
## Description
<Description toggle body>
## Subtasks (N)
- [Status] <subtask title> -> /agency-os start <subtask-id>
- ...
## Latest discussion entry (of <K> total)
<most-recent entry verbatim>
(For older entries: /agency-os show <id> --section discussion --entry <date>)
## Corpus: <name>
<Goal + Local guidance from the corpus page>
## General guidance
<full general guidance page body>
End with: Brief loaded. Proceed.
Overfeed protection. The brief never embeds:
refreshAuto-enumerate the agent-runnable To-Do set and write it to state/todo-ids.json. No arguments. The operator's only job upstream is to mark rows in Notion with Exec=Agent; refresh then fetches them via the Notion REST API and the sidecar is the enumeration substrate for run.
The currently installed Notion MCP does not expose property-filtered enumeration of a data source, so this command shells out to scripts/query-tasks.py, which posts to POST /v1/data_sources/{id}/query with a server-side Status="To-Do" AND Exec="Agent" filter (Notion API version 2025-09-03). The integration token (NOTION_KEY in .env) must be shared with the Tasks database.
Run:
python .claude/skills/agency-os/scripts/query-tasks.py
The script:
NOTION_KEY from .env and tasks_database.data_source_id from references/notion-pointers.json.has_more/next_cursor.description_preview (the text between the Description H2 and the next H2; first 200 chars)..claude/skills/agency-os/state/todo-ids.json:
{
"refreshed_at": "<iso>",
"tasks": [
{
"id": "<uuid>",
"url": "https://www.notion.so/...",
"title": "...",
"corpus": "General",
"priority": "3",
"effort": "M",
"type": "one-time",
"cadence": null,
"last_done": null,
"exec": "Agent",
"parent_task_id": null,
"has_todo_subtasks": false,
"description_preview": "<first 200 chars of Description>",
"dependencies": [
{ "id": "<uuid>", "status": "Done" },
{ "id": "<uuid>", "status": "To-Do" }
]
}
]
}
refreshed: <N> agent-runnable To-Do tasks -> state/todo-ids.json followed by one line per task.Failure modes. The script aborts with a non-zero exit and an explanatory message if NOTION_KEY is missing, the integration is not shared with the database, or the API returns an error. The existing sidecar is overwritten only after the query succeeds end-to-end.
run [--go]Batch-execute every task in state/todo-ids.json (which only contains rows with Status == "To-Do" AND Exec == "Agent").
Claude Code: The Haiku subagent builds the plan from the sidecar; the orchestrator picks a model per task at runtime and spawns execution agents.
Non-Claude harnesses: The skill reads config.json to determine available models. If config.json doesn't exist, run /agency-os init first.
Auto-refresh. run always calls refresh as its first step. If refresh fails, run aborts.
scripts/query-tasks.py via Bash — this is mandatory and must happen before reading anything. Run python .claude/skills/agency-os/scripts/query-tasks.py and verify it exits 0. Only then read the freshly written state/todo-ids.json. Never read the sidecar without running the script first; the file on disk is always stale.has_todo_subtasks: true, skip the parent — its work IS its subtasks.dependencies: [{id, status}]. For every dep:
status == "Done" -> satisfied, ignore.id is in the current in-batch set -> record as an intra-batch edge.blocked_deps[].1 + max(stage of its in-batch deps) (stage 1 = no in-batch deps). If a cycle is detected, abort.stages: [[(id, title, corpus, effort, parent_id, description_preview), ...], ...] plus blocked_deps.Before spawning any execution agent, the orchestrator prints the plan outline (see ### Output below) so the user sees which tasks are about to fire, in which stages, with which model per task. Then dispatching stage 1... and dispatch begins.
Stages run sequentially: every task in stage N must finish before stage N+1 starts. Within a stage, tasks fan out in parallel.
If any task in stage N closes as not Done, stage N+1 tasks that depend on it are dropped; added to the run summary's blocked-deps. Stage N+1 tasks whose deps all closed Done still run.
Claude Code: The orchestrator picks a model and spawns an execution agent per task.
Non-Claude harnesses: The skill reads config.json to determine which models are available, then suggests a complexity level (easy/med/hard) for each task based on the same heuristic.
Picker heuristic (same on all harnesses):
Cap concurrency at 5 parallel execution agents per stage.
Status discipline is non-negotiable. Every spawned agent MUST leave the row in a terminal-for-this-run state before returning. No exceptions, no "leave it at In Progress for the operator to see":
| Outcome | Final Status | Closer command |
|---|---|---|
| Full completion | Done (one-time) / To-Do w/ Last Done bumped (recurring) | /agency-os done <id> --result-link <url> --note "..." |
| Partial completion | Discussion | /agency-os move <id> --to discussion + /agency-os log <id> "partial: <what's left>" |
| Blocked on operator action | Discussion | /agency-os move <id> --to discussion + /agency-os log <id> "blocked-operator: <what operator must do>" |
| Needs clarification | Discussion | /agency-os move <id> --to discussion + /agency-os log <id> "needs-clarification: <question>" |
| Failed (crash, tool error, dead-end) | Discussion | /agency-os move <id> --to discussion + /agency-os log <id> "failed: <what broke>" |
Rationale: leaving a row at In Progress after the agent has stopped working is a lie about live state. The dashboard ends up cluttered with rows nothing is actually working on, and the next run can't tell whether to retry. Discussion is the correct holding pen for "an agent looked at this and could not close it" — the operator sees a real queue of things needing attention, and a follow-up /agency-os approve <id> is the explicit "try again" signal.
Every spawned agent:
Calls /agency-os start <id> to load the kickoff brief AND flip the row To-Do -> In Progress. This MUST be the first call; if the row is already In Progress (re-dispatch), start is idempotent and re-emits the brief.
Runnability check: can this task plausibly be completed end-to-end by an agent, or does it require operator action (logging into a personal account, solving a captcha, clicking publish in a UI without API access)?
/agency-os log <id> "blocked-operator: <one-line what the operator must do>", then /agency-os move <id> --to discussion, then emit the result report (step 6) with status: blocked-operator. Do not skip the status flip.Otherwise: execute the brief end-to-end.
Self-assessment. Before closing: did I complete 100% of the acceptance criteria? Partial completions are not Done.
Auto-close — required, every outcome. Pick the closer command from the table above and run it BEFORE emitting the result report. Verify the closer returned success. If the closer itself errors (e.g. Notion API hiccup), retry once; if it still fails, surface that in the result report's summary line as status: failed with the closer error appended — but still emit the report.
Result report — required, every run, no exceptions. The agent's final chat output MUST be a single block in this exact format. Returning nothing is not allowed — not on success, not on failure, not on a crash mid-execution. If the agent hit an unrecoverable error before it could do anything meaningful, it still emits the block with status: failed and describes what happened. The status: line in the report must agree with the final Notion status: done <-> Done, every other status <-> Discussion.
### <task-id> — [<title>](<notion-url>)
status: done | blocked-operator | needs-clarification | failed
model: haiku | sonnet | opus
result-link: <url or ->
summary: <1-2 sentences: what was done, or what blocked it>
next-step: <only if status != done; what operator should do next>
#### Full output
<verbatim full output of the agent's execution — every step taken, every tool result summary, every decision made. Do not truncate. If the agent produced no meaningful output beyond the header fields above, write "(no output)" here.>
Orchestrator accountability. After each stage, the orchestrator must confirm it received a result block from every agent it spawned. If an agent returned empty output or no output:
status: failed, summary: agent returned no output, full output: (agent returned no output).When a subtask transitions To-Do -> In Progress (via start), the skill also flips its parent To-Do -> In Progress, if the parent is currently To-Do. The parent stays In Progress until the operator (or a deliberate later /agency-os done <parent-id>) closes it.
The plan outline is ALWAYS printed first — both in dry-run (without --go) and in real dispatch (with --go). The user must see what's about to fire before any agent spawns. In --go mode, after printing the outline, immediately proceed to dispatch — do not pause for confirmation (the --go flag already is the confirmation).
Emit the outline as plain markdown (no fenced code block, so the links are clickable):
plan (<N> tasks, <S> stages):
<K> tasks, parallel):
<L> tasks, parallel, after stage 1):
If any tasks were dropped for external blockers, follow with:
blocked-deps (<B> tasks, not dispatched):
If blocked-deps is non-empty, also print: note: <B> task(s) have dependencies outside this batch. Approve the missing deps or run them first, then /agency-os run again.
In dry-run mode, the outline IS the entire output — stop here, fire nothing.
In --go mode, after the outline, print a one-line marker — dispatching stage 1... — then begin dispatch. After completion, the orchestrator emits two more sections:
1. Per-task detail — one block per task executed, in stage order, verbatim from each execution agent's result report:
---
### <task-id> — [<title>](<notion-url>)
status: done | blocked-operator | needs-clarification | failed
model: haiku | sonnet | opus
result-link: <url or ->
summary: <...>
next-step: <...>
#### Full output
<verbatim agent output>
---
2. Run summary — after all per-task blocks. Do NOT wrap the summary in a fenced code block (```), because markdown links inside code fences render as literal text in Claude Code. Emit the summary as plain markdown so [title](url) links are clickable:
run summary (T queued, S stages):
<N>): title, title, ...<M>): title, ...<P>): title, ...<B>): title (dep: dep-title), ...<Q>): title, ...Omit any row whose count is 0.
blocked-deps entries also surface which dep blocked them. The orchestrator must emit both sections — the per-task detail AND the summary — every time --go is used.
done <id>Close a task. Branches on Type.
<id> against rows in In Progress (preferred) or To-Do (allowed).Type == "one-time":
--result-link if given.### <date>: done — <note or "(no note)"> to the Done log.Done, surface a nudge: note: all subtasks of <parent> are done — consider /agency-os done <parent>.Type == "recurring":
### <date>: done — <note> to the Done log.✅ Done: [<title>](<result-link>) (if no result-link, omit the link and print title as plain text).kill <id> [--reason "..."]Terminal drop.
<id> against any non-terminal row.### <date>: killed — <reason> to the Done log.parent killed.✗ Killed: [<title>](<url>) ({<reason>}) (cascaded N descendants).next [N] [--corpus=<s>]Show top N (default 3) To-Do tasks. Does not execute.
Status == "To-Do" and (if --corpus given) matching corpus and Parent Task IS NULL (only top-level).<idx>. [<priority>][<corpus>][<type>{<cadence>}][<effort>] [<title>](<url>) (/agency-os start <id>).For recurring tasks, "overdue" = now - Last Done exceeds the cadence interval (daily=1d, weekly=7d, biweekly=14d, monthly=30d, quarterly=90d, yearly=365d).
statusCompact health overview.
list <view> [--corpus=<s>]Filtered listing. View in suggestion | discussion | todo | inprogress | done | killed | recurring | all.
Output: [Status][Type][Priority][Corpus] [<title>](<url>).
Subtasks are shown indented under their parent unless --flat is passed.
show <id> [--section ...] [--entry <date>]Read a task's content without mutating state. Always include a [<title>](<url>) link at the top of the output before showing the requested content.
--section description: Description only.--section discussion [--entry <date>]: discussion log; one specific entry if --entry given.--section donelog: done log entries.--section all: everything (full page body + subtask titles).update <id> ...Mutate properties without changing status. All flags optional. Print: ✓ Updated: [<title>](<url>).
--notes "..." replaces the Description toggle body. To append without replacing, use log instead.
--deps=<id1>,<id2>,... replaces the Dependencies relation (each id resolved live via Notion; refuse if any is unknown). --deps=none clears it. Self-reference and cycles are refused.
move <id> --to <status>Force-set status to any value (escape hatch). Bypasses normal flow gates. Logs a ### <date>: forced move <from> -> <to> line to the Discussion log. Print: -> <status>: [<title>](<url>).
When the user says these things in chat, the skill translates to commands:
| User says | Skill calls |
|---|---|
| "add a suggestion: <title>" / "new idea: <title>" | suggest "<title>" (asks for corpus if ambiguous) |
| "let's discuss X" / "open X for discussion" | discuss <id> |
| "log: <thing>" / "note that <thing>" (during discuss) | log <id> "<thing>" |
| "add a subtask: <title>" / "we'll also need to <thing>" | add-subtask <parent-id> "<title>" |
| "approve" / "approve it" / "ship it" / "go ahead" | approve <id> |
| "start X" / "launch X" / "let's do X now" | start <id> |
| "run todo" / "run all" / "execute the queue" | run (dry-run) -> user reviews -> run --go |
| "X is done [link <url>]" / "mark X done" | done <id> [--result-link <url>] |
| "kill X" / "drop X" / "X is no longer relevant" | kill <id> |
| "make X recurring weekly" / "X is recurring monthly" | update <id> --type=recurring --cadence=weekly |
| "what's next" / "what should I work on" | next [N] |
| "show me X" / "details on X" | show <id> |
| "what's the status" / "where are we" | status |
The skill is the user's hands. When in doubt, the agent asks: "Should I <command equivalent> for that?" and only mutates after confirmation. But trivial appends (a log entry during an open discuss session, an add-subtask from an explicit "we'll need to" moment) can be done without asking each time.
Fresh workspace:
/agency-os scaffold
You can pass --corpora "Name1,Name2,Name3" to scaffold with custom corpus names instead of the defaults (General, Recurring). You can always add more later with /agency-os add-corpus "<name>".
Migrating from an existing Notion setup:
/agency-os scaffold
# then read existing pages, parse content, and:
# - bulk /agency-os suggest each parsed item with --corpus inferred from heading
# - copy resource pages to the new Resources page
# - archive the old structure
# this is a one-shot manual migration; the skill does not provide a generic import command,
# because the source format varies and parsing requires per-source judgment.
next shows; only start (after explicit invocation) loads the brief and flips status.kill is terminal but archival; the row remains in Notion. To truly remove, archive in Notion manually.notion-pointers.json (which scaffold writes).suggest. Manual rows added via Notion UI bypass the check.start per task at a time. In Progress is the lock./agency-os suggest -> discuss -> log/add-subtask (clarify) -> approve -> start (loads brief) -> done
^ |
| (recurring loop) |
+--------------------+
Sync runs as preflight. Notion is canonical. The pointer file is the only repo binding. The DB grid stays clean; details fold; briefs stay bounded.