AutomateLab agency-os

Manage and mutate Notion-based agency task and corpus data via slash commands using medium-reasoning Sonnet subagent for accurate dispatch board updates.

Audits

Pass

ClawScanPass

Agentic behavior and permission review.

Static analysisPass

Pattern checks against bundled files.

VirusTotalPass

Multi-engine malware detections and file reputation.

Install

openclaw skills install agency-os

agency-os

Notion-as-source-of-truth dispatch board. One Tasks database, one Hub page, one page per Corpus, one page each for General Guidance and Resources. The skill mutates Notion via the Notion MCP (mcp__*__notion-* tools); only references/notion-pointers.json is committed to git.

Skill name decision: the skill is named agency-os (matching the repo). All commands are /agency-os <cmd>. This is the single plugin entry point; there is no agency-os/notion sub-namespace. If you embed this plugin alongside others, prefix commands with agency-os to avoid collisions.

Harness compatibility

This SKILL.md is the authoritative spec and is harness-agnostic. The slash-command interface (/agency-os ...), the status flow, the schema, the workspace layout, and every command's behavior are the same everywhere.

Per-harness wrappers only differ in two things:

How commands are triggered. Claude Code exposes them as real slash commands via the plugin manifest. Cursor / Cline / generic MCP harnesses load this file as instructions and rely on the user typing /agency-os ... (or the natural-language equivalent) into chat. Either way, the parser is the same.
Whether mutations are delegated to a subagent. The "Execution model" section below describes Claude Code's Haiku subagent dispatch. Other harnesses run mutations directly on the main agent. The Notion MCP calls underneath are identical; only the indirection changes.

See docs/harnesses/ for per-harness setup. If you're reading this in a non-Claude-Code harness, treat the subagent dispatch instructions as optional: execute commands inline instead.

Execution model — model selection by harness

Claude Code: delegate to Sonnet (medium reasoning) for mutations, orchestrator picks for batch execution

Every /agency-os <command> invocation runs on Sonnet at medium reasoning effort via a subagent, not on the orchestrator's model. The work this skill does — resolve an ID, mutate a Notion row via MCP, format a brief — needs enough judgment (dedup checks, brief assembly, dependency reasoning) that Haiku slipped on edge cases; Sonnet at medium effort is the right balance of accuracy and cost. The orchestrator stays free for the conversation around the command.

When the user invokes /agency-os <command> <args> (or the orchestrator translates a natural-language request into one — see "Natural-language driving" below), the orchestrator's only job is to dispatch:

Agent({
  description: "Run /agency-os <command>",
  subagent_type: "general-purpose",
  model: "sonnet",
  prompt: "Run the agency-os skill for: /agency-os <command> <args>.\n\nUse medium reasoning effort — think through ID resolution, dedup checks, and brief assembly carefully, but don't over-deliberate on mechanical mutations.\n\nRead .claude/skills/agency-os/SKILL.md and execute that exact command end-to-end: sync preflight (call notion-fetch live — never read from any local cache file), resolve IDs against the live Notion result, mutate Notion via the Notion MCP, and return the same output format the skill specifies (the brief for `start`, the `+ Suggestion: ... -> url` line for `suggest`, etc.). All task/result links must be formatted as CommonMark markdown links — `[title](url)` — never HTML `<a>` tags and never a bare URL. Claude Code renders markdown but not HTML, so HTML anchors show up as literal text. If the command is `start`, also emit the full kickoff brief verbatim. If anything fails (sync, MCP call, ID resolution), stop and report — do not guess.\n\nYOU MUST ALWAYS PRODUCE OUTPUT. Never return silently. On success: the skill's standard output. On failure: one paragraph describing exactly what failed, what was attempted, and what state Notion was left in. Returning nothing is not an option."
})

The orchestrator MUST always relay the subagent's result to the user — no exceptions.

If the subagent produced output: pass it through verbatim.
If the subagent returned empty output or no output at all: say so explicitly — subagent returned no output; execution status unknown. Check Notion directly.
Never absorb the result silently. Never say "I don't have information about what happened." If you don't know, say you don't know and tell the user to check Notion.

Do not re-run any step yourself, do not "double-check" the subagent's work.

Natural-language driving stays on the orchestrator. Parsing "let's discuss the X task" into /agency-os discuss <id>, asking the user clarifying questions during a discussion, deciding when to call log vs add-subtask — that conversation runs on the orchestrator so thread context survives across turns. Only the discrete mutation (each log, each add-subtask, each approve) dispatches to the Sonnet subagent.

Rule of thumb: anything that touches Notion via the MCP -> Sonnet subagent. Anything that's deciding what to touch -> orchestrator.

Non-Claude harnesses: read models from config.json

On Cursor, Cline, Continue, and generic MCP harnesses, the skill can't spawn subagents. Instead:

Before first use: run /agency-os init to store your preferred models in .claude/skills/agency-os/config.json.
Mutations (suggest, discuss, log, etc.): run inline on the main agent.
Batch execution (/agency-os run): read config.json and use the stored models' constraints when evaluating task complexity.

The config file has this shape:

{
  "harness": "cursor",
  "models": {
    "haiku": "claude-haiku-4-5-20251001",
    "sonnet": "claude-sonnet-4-6",
    "opus": "claude-opus-4-7"
  }
}

The /agency-os run command still uses the same picker heuristic (Haiku for mechanical work, Sonnet default, Opus for strategic) — but on non-Claude harnesses, you need to ensure your harness has an API or SDK integration with those models (e.g., via the Anthropic SDK in a Continue custom tool). If your harness only supports one model, set all three to that model and the skill will use it for everything.

What lives where (hybrid contract)

This skill is the one place where the project's local-vs-Notion split is enforced. Get this wrong and you either spam MCP tokens reading what should be a file, or you put mutable state somewhere git can't roll it back.

Local, in this repo, source-of-truth: every skill spec under .claude/skills/, every doc under docs/, the notion-pointers.json binding, and references/general-guidance.md (the canonical General Guidance text — Notion's page is a one-way mirror). Agents read these directly off disk — fast, free, greppable, diffable.
In Notion, source-of-truth: task rows (Title / Status / Corpus / Priority / Impact / Type / Cadence / Effort / Parent Task), discussion logs, done logs, corpus pages (Goal + local guidance the operator authors), and the Hub itself with its DB views. Anything the operator wants to review-in-place, anything that changes frequently, anything that doesn't benefit from git log.
One-way mirror, not two-way: references/general-guidance.md -> Notion General Guidance page is the only sync direction. Edit the local file, then push to Notion (via scaffold or a manual update). Never edit the Notion page and try to reverse-sync it; drift will follow.

When this skill assembles a brief for an agent, it pulls task state from Notion (the row, the discussion log, subtask titles) and stable spec from local files (general guidance, corpus local guidance if those are mirrored, links into docs/ and .claude/skills/). The brief contains pointers to local files, not copies of them, so the agent can grep further when it needs to.

# setup
/agency-os scaffold                                       # idempotent: build / verify the workspace
/agency-os init [--harness claude-code|cursor|cline|continue|generic-mcp] [--haiku=<model>] [--sonnet=<model>] [--opus=<model>]
                                                          # configure model selection (non-Claude harnesses only; Claude Code ignores this)
/agency-os sync                                           # preflight: verify live Notion connection and pointer IDs

# suggestions
/agency-os suggest "<title>" [--corpus=<s>] [--type one-time|recurring]
                              [--cadence daily|weekly|biweekly|monthly|quarterly|yearly]
                              [--notes "..."] [--effort S|M|L|XL]

# clarification & subtasks
/agency-os discuss <id>                                   # Suggestion -> Discussion + load context for clarification
/agency-os log <id> "<entry>"                             # append a discussion entry
/agency-os add-subtask <parent-id> "<title>" [--effort=<e>] [--notes "..."] [--deps=<id1>,<id2>,...]
/agency-os approve <id>                                   # Discussion -> To-Do; cascades active subtasks

# execution
/agency-os start <id>                                     # To-Do -> In Progress + emit kickoff brief
/agency-os refresh                                        # auto-enumerate Status=To-Do AND Exec=Agent via Notion REST API, write state/todo-ids.json
/agency-os run [--go]                                     # batch-execute the To-Do sidecar; orchestrator picks model per task
/agency-os done <id> [--result-link <url>] [--note "..."]
/agency-os kill <id> [--reason "..."]

# read
/agency-os next [N] [--corpus=<s>]
/agency-os status
/agency-os list <suggestion|discussion|todo|inprogress|done|killed|recurring|all> [--corpus=<s>]
/agency-os show <id> [--section description|discussion|donelog|all] [--entry <date>]

# escape hatches
/agency-os update <id> [--title="..."] [--notes="..."] [--priority=1|2|3|4]
                       [--effort=S|M|L|XL]
                       [--type=one-time|recurring] [--cadence=...] [--corpus=<s>]
                       [--deps=<id1>,<id2>,...|none]
/agency-os move <id> --to <status>                        # force any status transition
/agency-os add-corpus "<name>" [--goal "..."]

Natural language is also a trigger. When the user says "add a suggestion to ...", "let's discuss the X task", "log: agreed to ...", "add a subtask: ...", "approve that", "make weekly task recurring", "mark X done with link <url>", "kill the Y idea" — the skill mutates Notion the same way it would for the slash command equivalent. The slash commands are the canonical interface; natural language is a convenience layer.

Status flow — the dedup gate

Suggestion --discuss--> Discussion --approve--> To-Do --start--> In Progress --done--> Done
                                                                             ^
   any  --kill-->  Killed  (terminal)                                        |
                                                                             |
                              Recurring tasks: done logs an occurrence and   |
                              loops back to To-Do with Last Done updated. ---+

Status	Meaning	Set by
`Suggestion`	Idea in the inbox; not yet discussed	`suggest`, manual Notion add
`Discussion`	Under clarification; subtasks may be emerging; not yet approved	`discuss`
`To-Do`	Approved scope; scheduled to execute	`approve`, recurring loop on `done`
`In Progress`	An agent is actively working it; brief has been emitted	`start`
`Done`	Closed (one-time only)	`done` (when `Type=one-time`)
`Killed`	Intentionally dropped	`kill`

The picker filters on Status == "To-Do", prioritising 1 then 2 then 3 then 4 then unset, and within that by Created ascending. The launcher requires Status == "To-Do". The suggestor refuses dupes against any non-terminal status by Title-Jaccard >= 0.8.

If start crashes, the row sits at In Progress. Manual recovery: /agency-os move <id> --to todo or flip Status in Notion directly.

Sync — preflight on every command

Before every command, call notion-fetch <tasks_data_source_id> (from notion-pointers.json) to get live data from Notion. Resolve <id-or-substring> against the live result. Never read from notion-cache.json or any local snapshot. Mutations also target Notion directly.

If notion-fetch fails (Notion API down, OAuth expired), print sync failed: <reason> and abort — do not fall back to any cached file.

Workspace structure

Hub  (page)
+-- intro: what this board is for
+-- General Guidance  -> page
+-- General Plan      -> table; one row per Corpus, linked to its page
+-- Suggestions Inbox  (linked DB view: Status=Suggestion, sort Created desc)
+-- In Discussion      (linked DB view: Status=Discussion, sort Created desc)
+-- To-Do (Scheduled)  (linked DB view: Status=To-Do, sort Priority asc, group by Corpus)
+-- Recurring          (linked DB view: Type=recurring, sort Last Done asc)
+-- In Progress        (linked DB view: Status=In Progress)
+-- Recently Done      (linked DB view: Status=Done, sort Done At desc, limit 25)
+-- Resources          -> page

Tasks  (database)
+-- one row per task, of any status, including subtasks

Tasks database schema

Property	Type	Notes
`Title`	title	Imperative phrase, <=80 chars
`Status`	status	`Suggestion`, `Discussion`, `To-Do`, `In Progress`, `Done`, `Killed`. Default `Suggestion`
`Type`	select	`one-time` (default), `recurring`
`Cadence`	select	`daily`, `weekly`, `biweekly`, `monthly`, `quarterly`, `yearly`. Empty for `one-time`
`Last Done`	date	Set on `done` for recurring
`Corpus`	select	One of the configured corpora
`Priority`	select	`1`, `2`, `3`, `4` — urgency (1 = blocks something this week / 2 = this month / 3 = this quarter / 4 = nice to have / default). Lower number = higher urgency. Default `4`
`Impact`	select	`low`, `medium`, `high`, `outsized` — outcome size within the task's corpus, independent of urgency. Default `medium`
`Effort`	select	`S`, `M`, `L`, `XL`. Default `M`
`Exec`	select	`none` (default), `Agent`, `Human`. Operator-set gate: only `Agent`-marked To-Do rows enter the `run` queue
`Parent Task`	relation (self)	Set on subtasks; empty for top-level tasks
`Subtasks`	rollup	Auto-rolled from inverse of Parent Task; surfaces count
`Created`	created_time	Automatic
`Done At`	date	Set on terminal `Done` (one-time)
`Result Link`	url	Live link, post URL, PR URL, etc.
`Tags`	multi_select	Cross-cutting concerns (user-defined)
`Dependencies`	relation (self)	IDs of tasks that must reach `Done` before this row is dispatchable. Used only by `run` to stage execution; ignored by `start`, `next`, `list`. Empty by default

Task page body

Every task page uses toggleable H2 sections so the DB grid stays clean and details fold on click. Created via Notion's is_toggleable: true heading flag (Notion API). If the MCP path can't set the flag, fall back to plain H2; the user can toggle manually.

> Launch this task
> Paste in Claude Code: `/agency-os start <id>`

Description     <- starts expanded
   <freeform: what to do, why, acceptance criteria, links to docs/skills>

Subtasks        <- starts expanded if any subtasks exist, else hidden
   (linked DB view: Parent Task = this, sort Created asc)

Discussion log  <- starts collapsed
   ### 2026-01-10 — initial clarification
   Q: ...
   A: ...
   Decisions:
   - ...

Done log        <- starts collapsed; only meaningful for recurring
   ### 2026-01-12: completed by agent — link <url>

Related         <- starts expanded; one-line each
   Corpus: [-> <corpus>](<corpus-url>)
   General guidance: [-> Guidance](<guidance-url>)
   Parent: [-> <parent-title>](<parent-url>)   <- only for subtasks

Corpus pages

# Corpus: <name>

## Goal
1-3 sentences on what "done" looks like for this whole corpus.

## Local guidance
Conventions, owners, references. Anything an agent needs before its first task here.

## Tasks
(linked DB view: Corpus = this, group by Status)

General Guidance page

Project-wide rules: link into docs/, link into .claude/skills/, the launch flow, house style. Kept short. Links beat duplication. Seeded from references/general-guidance.md — edit the local file, push the mirror.

Pointer + cache files

.claude/skills/agency-os/references/notion-pointers.json (committed):

{
  "hub": { "page_id": "<uuid>", "url": "...", "title": "..." },
  "tasks_database": {
    "database_id": "<uuid>",
    "data_source_id": "<uuid>",
    "url": "...",
    "task_id_prefix": "OS"
  },
  "guidance": { "page_id": "<uuid>", "url": "...", "title": "..." },
  "resources": { "page_id": "<uuid>", "url": "...", "title": "..." },
  "corpora": {
    "General": { "page_id": "<uuid>", "url": "...", "title": "General" }
  },
  "hub_views": { "Suggestions Inbox": "view://<uuid>", ... },
  "corpus_views": { "General": "view://<uuid>" },
  "schema_summary": { ... }
}

Command: `init [--harness=...] [--haiku=...] [--sonnet=...] [--opus=...]`

Non-Claude harnesses only. Configure which models to use for task execution. Claude Code harnesses ignore this; they have built-in model selectors.

Why it matters: Cursor, Cline, Continue, and generic MCP harnesses can't spawn subagents with different models on the fly. Instead, store your preferred models upfront in .claude/skills/agency-os/config.json, then the skill uses them during batch execution.

Interactive mode (recommended):

/agency-os init

Prompts:

Which harness are you using? (or auto-detect from environment)
For easy mechanical tasks (form fills, recurring routines), which model? (default: haiku-4-5)
For medium substantive work (drafting, audits, revisions), which model? (default: sonnet-4-6)
For hard strategic work (design, multi-skill reasoning), which model? (default: opus-4-7)

Stores in .claude/skills/agency-os/config.json and prints config: created -> <path>.

Non-interactive mode:

/agency-os init --harness cursor --haiku claude-haiku-4-5 --sonnet claude-sonnet-4-6 --opus claude-opus-4-7

If a harness doesn't support a model (e.g., Cursor is configured for only Sonnet), pass the model it does support for all three tiers; the skill will use it for everything.

If config.json already exists, re-running init overwrites it. To reset: /agency-os init interactively.

Command: `scaffold`

Idempotent setup. If notion-pointers.json exists and every ID resolves via notion-fetch, prints scaffold: already in place and exits. Otherwise creates only what's missing.

Locate or create the Hub. Search by title; create at workspace root if absent.
Create the Tasks database as a child of the Hub, with the schema above. Capture both database_id and data_source_id. The Dependencies property is a self-relation on Tasks (separate from Parent Task — that's the hierarchy relation; Dependencies is the gating relation used only by run).
Create the General Guidance page under the Hub; seed body from references/general-guidance.md.
Create each Corpus page under the Hub. Default set: General, Recurring (user can customise via --corpora flag or add later with add-corpus). Seed each from references/corpus-template.md.
Create the Resources page under the Hub; seed from references/resources.md if present.
Add linked DB views to the Hub for Suggestions Inbox, In Discussion, To-Do, Recurring, In Progress, Recently Done. Add a per-corpus filtered view to each Corpus page. Every view's SHOW clause must include Task ID as the leftmost column so subtask IDs are reachable at a glance.
Wire interlinks (Hub <-> Guidance <-> Corpora <-> Resources).
Write notion-pointers.json.
Full-width pages — manual one-time toggle. The Notion MCP doesn't expose page_full_width, so Hub / Guidance / Resources / each Corpus page / new task pages need the ... menu -> "Full width" toggle flipped on by the operator after creation. Surface this in scaffold's final output so the operator knows to do it.
Sub-items — manual one-time toggle. The Notion MCP doesn't expose the Sub-items setting either. On the Tasks DB, the operator must enable Sub-items and wire it to the existing Parent Task (parent) / Subtasks (children) relation. Path varies by Notion version: typically ... menu -> "Sub-items" -> pick Parent Task. Once wired, every view in the Hub gets a chevron on rows with children.
Print the Hub URL.

add-corpus "<name>" extends the General Plan post-scaffold: appends a Corpus select option, creates the page, adds the filtered view, updates pointers. Print: + Corpus: [<name>](<url>).

Command: `suggest "<title>" ...`

Add a row in Suggestion status.

Sync preflight.
Validate: --corpus is in pointers (else list and refuse); if --type=recurring, --cadence is required.
Dedup check: refuse if Title-Jaccard >= 0.8 against any row with status in {Suggestion, Discussion, To-Do, In Progress}.
notion-create-pages with parent = Tasks data source. Properties: Title, Status=Suggestion, Corpus, Type, Cadence (if recurring), Effort. Page body = task-page-template.md rendered.
If --notes provided, write into the Description section.
Print: + Suggestion: [<title>](<url>).

Command: `discuss <id>`

Begin clarification on a Suggestion. Flips status to Discussion and prepares the agent to ask clarifying questions.

Sync preflight.
Resolve <id> (UUID, URL, or unique Title substring against Suggestion rows).
notion-update-page Status -> Discussion.
Print the discussion brief: row properties + Description section. End with: Ready to clarify. Ask your questions or paste new requirements; the skill will log them with /agency-os log <id> and create subtasks with /agency-os add-subtask <id>. [Open task](<task-url>)
The agent in this conversation now drives the clarification: it asks questions, accepts user answers, calls log for each round, calls add-subtask whenever the user's responses imply concrete new work.

discuss does not require status to be Suggestion — calling it on an already-Discussion row is fine and reloads the brief.

Command: `log <id> "<entry>"`

Append a discussion entry to the task page.

Sync preflight.
Resolve <id>.
Find or create the Discussion log toggle on the page. Append a new dated entry:
```
### <YYYY-MM-DD> — <auto-summary first 6 words of entry>
<entry>
```
Print: + Logged: [<title>](<url>).
If the entry contains lines starting with + they're treated as proposed subtasks and surfaced in the output: note: detected N proposed subtasks; create with /agency-os add-subtask <id> "<title>".

The agent should call log multiple times during a discussion — once per Q/A round, or once per cohesive thought — rather than dumping a single megalogue at the end. This keeps the log queryable by date.

Command: `add-subtask <parent-id> "<title>" ...`

Create a subtask row.

Sync preflight.
Resolve <parent-id>. Refuse if parent's status is Done or Killed.
notion-create-pages with parent = Tasks data source. Properties: Title, Status = parent's status (Discussion or To-Do typically), Corpus = parent's corpus, Parent Task = parent, Type = one-time, Effort from flag or default, Dependencies from --deps if provided (each id resolved live via Notion; refuse if any id is unknown).
Append a line to the parent's Discussion log section: ### <date> — subtask added: [<title>](<url>).
Print: + Subtask of <parent-title>: [<title>](<url>){ deps=N}.

Subtasks can have their own subtasks (nesting allowed; the skill doesn't enforce a depth limit but warns at depth >= 3).

Structuring work — parent vs subtask vs log entry

The hardest part of using this board well is deciding what shape a piece of work takes. Three rules:

1. A task is a coherent unit with a clear "done" state. It can be shipped, merged, published, decided. "Set up X integration" is a task. "Think about X" is not.

2. A subtask is a child task that can be completed independently and is bounded by a deliverable, not by a step. "Write the onboarding blurb" is a subtask because the blurb is a separable artifact with its own done. "Click submit on the form" is not a subtask — log it in the discussion or done note instead.

Rule of thumb: if you'd naturally hand it to a different agent on a different day, it's a subtask. If you'd do it inline while working the parent, it's a step.

3. A log entry is a decision, clarification, or update on existing scope. "Decided to launch Tue not Wed" is a log entry. "Operator handles the review thread" is a log entry, not a subtask.

Depth

Top-level: standalone work or a top-level container.
Subtask (depth 1): the normal case. Most subtasks live here.
Nested subtask (depth 2): legitimate when a parent has multiple major deliverables, each with its own breakdown.
Depth 3+: the skill warns. Almost always means the hierarchy should be flattened or split into separate top-level tasks.

The "move this chat to Notion" workflow

When the user says "save this to Notion", "make this a task", "track this in Notion", "capture this conversation" mid-chat, read the conversation as a tree, not a transcript:

Identify the parent task. -> /agency-os suggest "<title>" --corpus=<inferred>.
Open it for discussion immediately. -> /agency-os discuss <id>.
Log the rationale and major decisions as one or two distilled entries. -> /agency-os log <id> "<distilled>".
Carve out subtasks for each separable deliverable. -> /agency-os add-subtask <id> "<title>" per item.
Stop and ask before approving. End with: Captured to <url> in Discussion. Approve when you're ready to schedule.

Command: `approve <id>`

Promote a task from Discussion -> To-Do, cascading active children.

Sync preflight.
Resolve <id>. Verify status is Discussion (or Suggestion — fast-track allowed). Refuse otherwise.
Cascade: collect every descendant with status in {Suggestion, Discussion}. For each, set Status -> To-Do.
Set the parent's Status -> To-Do. If --priority provided, set on the parent only.
Append a ### <date> — approved entry to the parent's Discussion log.
Print: -> To-Do: [<title>](<url>) (cascaded N subtasks).

Command: `start <id>` (alias: `launch`)

Move To-Do -> In Progress and emit the kickoff brief.

Sync preflight.
Resolve <id> (UUID, URL, or substring against To-Do rows).
Verify status is To-Do. If Suggestion -> refuse with discuss it first. If Discussion -> refuse with approve it first. If In Progress -> soft-allow (re-emits brief).
notion-update-page Status -> In Progress.

Assemble the kickoff brief, in this exact order:

## Task
[<title>](<notion-url>)  [<corpus> / <priority> / type=<type>{ cadence=<cadence>}{ last_done=<date>} / effort=<effort>]

## Description
<Description toggle body>

## Subtasks (N)
- [Status]  <subtask title>  ->  /agency-os start <subtask-id>
- ...

## Latest discussion entry  (of <K> total)
<most-recent entry verbatim>
(For older entries: /agency-os show <id> --section discussion --entry <date>)

## Corpus: <name>
<Goal + Local guidance from the corpus page>

## General guidance
<full general guidance page body>

End with: Brief loaded. Proceed.

Overfeed protection. The brief never embeds:

Older discussion entries (only the latest; the rest are referenced by date)
Sibling tasks
Subtask bodies (only their titles + status)
The Done log

Command: `refresh`

Auto-enumerate the agent-runnable To-Do set and write it to state/todo-ids.json. No arguments. The operator's only job upstream is to mark rows in Notion with Exec=Agent; refresh then fetches them via the Notion REST API and the sidecar is the enumeration substrate for run.

The currently installed Notion MCP does not expose property-filtered enumeration of a data source, so this command shells out to scripts/query-tasks.py, which posts to POST /v1/data_sources/{id}/query with a server-side Status="To-Do" AND Exec="Agent" filter (Notion API version 2025-09-03). The integration token (NOTION_KEY in .env) must be shared with the Tasks database.

Run:

python .claude/skills/agency-os/scripts/query-tasks.py

The script:

Loads NOTION_KEY from .env and tasks_database.data_source_id from references/notion-pointers.json.
Queries the data source with the two-gate filter, paginating through has_more/next_cursor.
For each result, fetches the page's block children once to extract a description_preview (the text between the Description H2 and the next H2; first 200 chars).

Writes .claude/skills/agency-os/state/todo-ids.json:

{
  "refreshed_at": "<iso>",
  "tasks": [
    {
      "id": "<uuid>",
      "url": "https://www.notion.so/...",
      "title": "...",
      "corpus": "General",
      "priority": "3",
      "effort": "M",
      "type": "one-time",
      "cadence": null,
      "last_done": null,
      "exec": "Agent",
      "parent_task_id": null,
      "has_todo_subtasks": false,
      "description_preview": "<first 200 chars of Description>",
      "dependencies": [
        { "id": "<uuid>", "status": "Done" },
        { "id": "<uuid>", "status": "To-Do" }
      ]
    }
  ]
}

Prints a summary: refreshed: <N> agent-runnable To-Do tasks -> state/todo-ids.json followed by one line per task.

Failure modes. The script aborts with a non-zero exit and an explanatory message if NOTION_KEY is missing, the integration is not shared with the database, or the API returns an error. The existing sidecar is overwritten only after the query succeeds end-to-end.

Command: `run [--go]`

Batch-execute every task in state/todo-ids.json (which only contains rows with Status == "To-Do" AND Exec == "Agent").

Claude Code: The Haiku subagent builds the plan from the sidecar; the orchestrator picks a model per task at runtime and spawns execution agents.

Non-Claude harnesses: The skill reads config.json to determine available models. If config.json doesn't exist, run /agency-os init first.

Auto-refresh. run always calls refresh as its first step. If refresh fails, run aborts.

Plan phase (Haiku subagent)

Execute scripts/query-tasks.py via Bash — this is mandatory and must happen before reading anything. Run python .claude/skills/agency-os/scripts/query-tasks.py and verify it exits 0. Only then read the freshly written state/todo-ids.json. Never read the sidecar without running the script first; the file on disk is always stale.
Dedup containers. For each row with has_todo_subtasks: true, skip the parent — its work IS its subtasks.
Resolve dependencies. Each sidecar row carries dependencies: [{id, status}]. For every dep:
- status == "Done" -> satisfied, ignore.
- dep id is in the current in-batch set -> record as an intra-batch edge.
- otherwise -> external blocker: drop from dispatch plan, collect into blocked_deps[].
Topological stage assignment. Build a DAG from intra-batch edges; assign each task a stage = 1 + max(stage of its in-batch deps) (stage 1 = no in-batch deps). If a cycle is detected, abort.
Sort within each stage: Priority asc, then overdue-recurring first, then Effort asc.
Return the plan to the orchestrator as stages: [[(id, title, corpus, effort, parent_id, description_preview), ...], ...] plus blocked_deps.

Dispatch phase (orchestrator)

Before spawning any execution agent, the orchestrator prints the plan outline (see ### Output below) so the user sees which tasks are about to fire, in which stages, with which model per task. Then dispatching stage 1... and dispatch begins.

Stages run sequentially: every task in stage N must finish before stage N+1 starts. Within a stage, tasks fan out in parallel.

If any task in stage N closes as not Done, stage N+1 tasks that depend on it are dropped; added to the run summary's blocked-deps. Stage N+1 tasks whose deps all closed Done still run.

Claude Code: The orchestrator picks a model and spawns an execution agent per task.

Non-Claude harnesses: The skill reads config.json to determine which models are available, then suggests a complexity level (easy/med/hard) for each task based on the same heuristic.

Picker heuristic (same on all harnesses):

Haiku (easy) — mechanical, template-driven, single-skill: form filings, recurring routines, log-and-close mutations, anything that's "fill a form / file a PR / post a comment from a known template."
Sonnet (medium, default) — substantive content/comms work, judgment-bearing audits, multi-step drafting, anything that needs a draft + revision pass.
Opus (hard) — strategic design, multi-skill orchestration, hard reasoning. Rare.

Cap concurrency at 5 parallel execution agents per stage.

Execution-agent contract

Status discipline is non-negotiable. Every spawned agent MUST leave the row in a terminal-for-this-run state before returning. No exceptions, no "leave it at In Progress for the operator to see":

Outcome	Final Status	Closer command
Full completion	`Done` (one-time) / `To-Do` w/ Last Done bumped (recurring)	`/agency-os done <id> --result-link <url> --note "..."`
Partial completion	`Discussion`	`/agency-os move <id> --to discussion` + `/agency-os log <id> "partial: <what's left>"`
Blocked on operator action	`Discussion`	`/agency-os move <id> --to discussion` + `/agency-os log <id> "blocked-operator: <what operator must do>"`
Needs clarification	`Discussion`	`/agency-os move <id> --to discussion` + `/agency-os log <id> "needs-clarification: <question>"`
Failed (crash, tool error, dead-end)	`Discussion`	`/agency-os move <id> --to discussion` + `/agency-os log <id> "failed: <what broke>"`

Rationale: leaving a row at In Progress after the agent has stopped working is a lie about live state. The dashboard ends up cluttered with rows nothing is actually working on, and the next run can't tell whether to retry. Discussion is the correct holding pen for "an agent looked at this and could not close it" — the operator sees a real queue of things needing attention, and a follow-up /agency-os approve <id> is the explicit "try again" signal.

Every spawned agent:

Calls /agency-os start <id> to load the kickoff brief AND flip the row To-Do -> In Progress. This MUST be the first call; if the row is already In Progress (re-dispatch), start is idempotent and re-emits the brief.
Runnability check: can this task plausibly be completed end-to-end by an agent, or does it require operator action (logging into a personal account, solving a captcha, clicking publish in a UI without API access)?
- If operator-only: call /agency-os log <id> "blocked-operator: <one-line what the operator must do>", then /agency-os move <id> --to discussion, then emit the result report (step 6) with status: blocked-operator. Do not skip the status flip.
Otherwise: execute the brief end-to-end.
Self-assessment. Before closing: did I complete 100% of the acceptance criteria? Partial completions are not Done.
Auto-close — required, every outcome. Pick the closer command from the table above and run it BEFORE emitting the result report. Verify the closer returned success. If the closer itself errors (e.g. Notion API hiccup), retry once; if it still fails, surface that in the result report's summary line as status: failed with the closer error appended — but still emit the report.

Result report — required, every run, no exceptions. The agent's final chat output MUST be a single block in this exact format. Returning nothing is not allowed — not on success, not on failure, not on a crash mid-execution. If the agent hit an unrecoverable error before it could do anything meaningful, it still emits the block with status: failed and describes what happened. The status: line in the report must agree with the final Notion status: done <-> Done, every other status <-> Discussion.

### <task-id> — [<title>](<notion-url>)
status:       done | blocked-operator | needs-clarification | failed
model:        haiku | sonnet | opus
result-link:  <url or ->
summary:      <1-2 sentences: what was done, or what blocked it>
next-step:    <only if status != done; what operator should do next>

#### Full output
<verbatim full output of the agent's execution — every step taken, every tool result summary, every decision made. Do not truncate. If the agent produced no meaningful output beyond the header fields above, write "(no output)" here.>

Orchestrator accountability. After each stage, the orchestrator must confirm it received a result block from every agent it spawned. If an agent returned empty output or no output:

Treat it as status: failed, summary: agent returned no output, full output: (agent returned no output).
Include it in the run summary under ❌ failed with that note.
Never omit a task from the summary because its agent was silent.

Parent-cascade rule

When a subtask transitions To-Do -> In Progress (via start), the skill also flips its parent To-Do -> In Progress, if the parent is currently To-Do. The parent stays In Progress until the operator (or a deliberate later /agency-os done <parent-id>) closes it.

Output

The plan outline is ALWAYS printed first — both in dry-run (without --go) and in real dispatch (with --go). The user must see what's about to fire before any agent spawns. In --go mode, after printing the outline, immediately proceed to dispatch — do not pause for confirmation (the --go flag already is the confirmation).

Emit the outline as plain markdown (no fenced code block, so the links are clickable):

plan (<N> tasks, <S> stages):

stage 1 (<K> tasks, parallel):
- [haiku] title
- [sonnet] title
- ...
stage 2 (<L> tasks, parallel, after stage 1):
- [haiku] title — deps: dep-title
- ...

If any tasks were dropped for external blockers, follow with:

blocked-deps (<B> tasks, not dispatched):

title — missing: dep-title (or raw dep-id if unknown)

If blocked-deps is non-empty, also print: note: <B> task(s) have dependencies outside this batch. Approve the missing deps or run them first, then /agency-os run again.

In dry-run mode, the outline IS the entire output — stop here, fire nothing.

In --go mode, after the outline, print a one-line marker — dispatching stage 1... — then begin dispatch. After completion, the orchestrator emits two more sections:

1. Per-task detail — one block per task executed, in stage order, verbatim from each execution agent's result report:

---
### <task-id> — [<title>](<notion-url>)
status:       done | blocked-operator | needs-clarification | failed
model:        haiku | sonnet | opus
result-link:  <url or ->
summary:      <...>
next-step:    <...>

#### Full output
<verbatim agent output>
---

2. Run summary — after all per-task blocks. Do NOT wrap the summary in a fenced code block (```), because markdown links inside code fences render as literal text in Claude Code. Emit the summary as plain markdown so [title](url) links are clickable:

run summary (T queued, S stages):

✅ done (<N>): title, title, ...
🟡 needs operator (<M>): title, ...
🟡 needs clarification (<P>): title, ...
🟡 blocked-deps (<B>): title (dep: dep-title), ...
❌ failed (<Q>): title, ...

Omit any row whose count is 0.

blocked-deps entries also surface which dep blocked them. The orchestrator must emit both sections — the per-task detail AND the summary — every time --go is used.

Command: `done <id>`

Close a task. Branches on Type.

Sync preflight.
Resolve <id> against rows in In Progress (preferred) or To-Do (allowed).
If Type == "one-time":
- Status -> Done; Done At -> today; Result Link -> --result-link if given.
- Append ### <date>: done — <note or "(no note)"> to the Done log.
- For subtasks: if all siblings are now Done, surface a nudge: note: all subtasks of <parent> are done — consider /agency-os done <parent>.
If Type == "recurring":
- Status -> To-Do (loops back).
- Last Done -> today.
- Append ### <date>: done — <note> to the Done log.
Print: ✅ Done: [<title>](<result-link>) (if no result-link, omit the link and print title as plain text).

Command: `kill <id> [--reason "..."]`

Terminal drop.

Sync preflight.
Resolve <id> against any non-terminal row.
Status -> Killed. Append ### <date>: killed — <reason> to the Done log.
Cascade: for every descendant in non-terminal status, also set Status -> Killed with reason parent killed.
Print: ✗ Killed: [<title>](<url>) ({<reason>}) (cascaded N descendants).

Command: `next [N] [--corpus=<s>]`

Show top N (default 3) To-Do tasks. Does not execute.

Sync preflight.
Filter: Status == "To-Do" and (if --corpus given) matching corpus and Parent Task IS NULL (only top-level).
Sort by Priority (1 first; unset last), then Type (recurring with overdue Last Done first), then Created ascending.
Print N rows: <idx>. [<priority>][<corpus>][<type>{<cadence>}][<effort>] [<title>](<url>) (/agency-os start <id>).

For recurring tasks, "overdue" = now - Last Done exceeds the cadence interval (daily=1d, weekly=7d, biweekly=14d, monthly=30d, quarterly=90d, yearly=365d).

Command: `status`

Compact health overview.

Sync preflight.
Print:
- Counts per Status.
- Counts per Corpus x Status (small table).
- Top 5 Suggestions (most recent).
- Top 5 To-Do (highest priority + overdue recurring).
- In Progress rows (flag stale if older than 7 days).
- Done This Week count.
- Recurring tasks overdue count.

Command: `list <view> [--corpus=<s>]`

Output: [Status][Type][Priority][Corpus] [<title>](<url>).

Subtasks are shown indented under their parent unless --flat is passed.

Command: `show <id> [--section ...] [--entry <date>]`

Read a task's content without mutating state. Always include a [<title>](<url>) link at the top of the output before showing the requested content.

Default: row properties + Description + subtask titles.
--section description: Description only.
--section discussion [--entry <date>]: discussion log; one specific entry if --entry given.
--section donelog: done log entries.
--section all: everything (full page body + subtask titles).

Command: `update <id> ...`

Mutate properties without changing status. All flags optional. Print: ✓ Updated: [<title>](<url>).

--notes "..." replaces the Description toggle body. To append without replacing, use log instead.

--deps=<id1>,<id2>,... replaces the Dependencies relation (each id resolved live via Notion; refuse if any is unknown). --deps=none clears it. Self-reference and cycles are refused.

Command: `move <id> --to <status>`

Force-set status to any value (escape hatch). Bypasses normal flow gates. Logs a ### <date>: forced move <from> -> <to> line to the Discussion log. Print: -> <status>: [<title>](<url>).

Natural-language driving

When the user says these things in chat, the skill translates to commands:

User says	Skill calls
"add a suggestion: <title>" / "new idea: <title>"	`suggest "<title>"` (asks for corpus if ambiguous)
"let's discuss X" / "open X for discussion"	`discuss <id>`
"log: <thing>" / "note that <thing>" (during discuss)	`log <id> "<thing>"`
"add a subtask: <title>" / "we'll also need to <thing>"	`add-subtask <parent-id> "<title>"`
"approve" / "approve it" / "ship it" / "go ahead"	`approve <id>`
"start X" / "launch X" / "let's do X now"	`start <id>`
"run todo" / "run all" / "execute the queue"	`run` (dry-run) -> user reviews -> `run --go`
"X is done [link <url>]" / "mark X done"	`done <id> [--result-link <url>]`
"kill X" / "drop X" / "X is no longer relevant"	`kill <id>`
"make X recurring weekly" / "X is recurring monthly"	`update <id> --type=recurring --cadence=weekly`
"what's next" / "what should I work on"	`next [N]`
"show me X" / "details on X"	`show <id>`
"what's the status" / "where are we"	`status`

The skill is the user's hands. When in doubt, the agent asks: "Should I <command equivalent> for that?" and only mutates after confirmation. But trivial appends (a log entry during an open discuss session, an add-subtask from an explicit "we'll need to" moment) can be done without asking each time.

Bootstrapping

Fresh workspace:

/agency-os scaffold

You can pass --corpora "Name1,Name2,Name3" to scaffold with custom corpus names instead of the defaults (General, Recurring). You can always add more later with /agency-os add-corpus "<name>".

Migrating from an existing Notion setup:

/agency-os scaffold
# then read existing pages, parse content, and:
#   - bulk /agency-os suggest each parsed item with --corpus inferred from heading
#   - copy resource pages to the new Resources page
#   - archive the old structure
# this is a one-shot manual migration; the skill does not provide a generic import command,
# because the source format varies and parsing requires per-source judgment.

What this skill does NOT do

Does not auto-execute To-Do items. next shows; only start (after explicit invocation) loads the brief and flips status.
Does not write content, deploy, or run any other skill. It mutates Notion. The brief may point an agent at another skill, but this skill never invokes one.
Does not delete content. kill is terminal but archival; the row remains in Notion. To truly remove, archive in Notion manually.
Does not commit or push to git beyond notion-pointers.json (which scaffold writes).
Does not enforce subtask completion before parent done. It surfaces a nudge, not a refusal.
Does not de-dup beyond Title-Jaccard 0.8 on suggest. Manual rows added via Notion UI bypass the check.
Does not auto-archive old discussion entries. The Discussion log grows unbounded on the page; the brief assembler bounds the agent's context (latest entry only, older ones referenced).
Does not run more than one start per task at a time. In Progress is the lock.

One-command summary

/agency-os suggest -> discuss -> log/add-subtask (clarify) -> approve -> start (loads brief) -> done
                                                                          ^                     |
                                                                          |  (recurring loop)   |
                                                                          +--------------------+

Sync runs as preflight. Notion is canonical. The pointer file is the only repo binding. The DB grid stays clean; details fold; briefs stay bounded.