Install
openclaw skills install skill-progressive-disclosure-designDecide how to split skill content between SKILL.md and reference files for context efficiency and reliable triggering. Use this whenever creating a new Claude skill, refactoring an existing one, or when a SKILL.md is growing past 300-400 lines. Also trigger when the user mentions "progressive disclosure", "reference files", "splitting skills", "skill bundling", "context window for skills", "SKILL.md too long", "what goes in references/", "skill structure", or expresses any uncertainty about where to put content within a skill. Use this even if the user phrases the question as a triggering problem ("how do I make my skill trigger better"), because that question is often confused with the splitting question and needs to be disentangled first.
openclaw skills install skill-progressive-disclosure-designEach section that recommends a direction includes explicit pros and cons. The decisions in this skill are trade-offs, not rules. The model using this skill should reason from the trade-offs to the user's specific situation rather than apply rules blindly.
Two problems get conflated and need separating before any splitting decision.
Triggering is whether Claude invokes the skill at all. Driven entirely by the YAML description. File splitting does not affect triggering. If the question is "my skill doesn't trigger reliably", do not split files, fix the description (use run_loop.py from the skill-creator skill).
Progressive disclosure is what loads after the skill activates. SKILL.md body always loads. references/* only loads when SKILL.md tells the model to read a specific file. scripts/* executes without loading into context at all. This is where context protection happens.
If the user is asking about splitting because of triggering issues, surface the confusion first and redirect.
A monolithic SKILL.md beats a split one until proven otherwise.
Split only when at least one is true:
Pros of staying monolithic:
Cons of staying monolithic:
User intent selects exactly one path. SKILL.md holds the decision logic and shared workflow. Each references/<variant>.md holds path-specific detail.
my-skill/
├── SKILL.md # decision tree + shared steps
└── references/
├── variant-a.md
├── variant-b.md
└── variant-c.md
Examples of clean variants: cloud provider, database engine, framework choice, output format, language.
Pros:
Cons:
SKILL.md holds the procedure (verbs, sequence, decisions). references/ holds lookup material queried by key.
Good reference content: schemas, error code tables, API surface listings, example galleries, configuration option matrices, design tokens.
Pros:
Cons:
SKILL.md covers the 80% case. references/edge-cases.md covers the rest.
The pointer must read like:
If you see X, Y, or Z, stop and read
references/edge-cases.mdbefore continuing.
Pros:
Cons:
For each anti-pattern, "why it appears attractive" shows what makes designers reach for it; "why it fails" shows what goes wrong in practice.
A testing skill split into unit.md, integration.md, mocks.md is a typical example.
Why it appears attractive:
Why it fails:
Why it appears attractive:
Why it fails:
Why it appears attractive:
Why it fails:
Why it appears attractive:
Why it fails:
When SKILL.md points at a reference, the pointer is the entire load contract. Rules:
go126-simd.md not advanced.md.Pros of strict pointer hygiene:
Cons of strict pointer hygiene:
For anything deterministic (formatting, validation, schema generation, file transforms, regex-heavy parsing), a script in scripts/ beats prose in references/.
Pros of scripts over reference prose:
Cons of scripts:
Before splitting any content out of SKILL.md, answer:
If the answer to question 1 is unclear, do not split.
Architecture evaluation is different from output evaluation. Output evals ask "did the skill produce the right thing?". Architecture evals ask "did the skill load the right files for the right reasons, at acceptable cost?". Same harness, different metrics. Run both. Output quality is the floor; architecture is optimization above that floor.
Pros of running architecture evals:
Cons of running architecture evals:
Output evals optimize for output quality across realistic queries. Architecture evals optimize for path coverage. The eval set must exercise every code path the skill claims to have, otherwise the metrics are noise.
Construct, at minimum:
If no realistic query triggers a given reference file, that file is dead. Inline it or delete it before running anything.
Each eval run is executed by a subagent with the skill loaded. Capture per run:
references/* files were read (parse view calls on paths inside the skill directory).scripts/* were invoked.Persist as transcript.json and loads.json per run, alongside the standard output. The harness from skill-creator already records tokens and time in timing.json; extend its grading step to extract reference loads from transcripts.
Across all eval runs, for each references/*.md:
| Observation | Action |
|---|---|
| Reference loaded in <20% of runs | Inline into SKILL.md or delete — routing overhead not justified |
| Reference loaded in 20–80% of runs | Leave split — the sweet spot; routing pays off |
| Reference loaded in >80% of runs | Promote into SKILL.md — always-load cost beats routing cost |
| Two references co-load in >70% of runs | Merge into one file |
| Reference loaded but not used in output | Fix or remove the pointer in SKILL.md |
| Reference re-read inside the same run | SKILL.md routing is unclear; clarify |
| No query triggers a reference | Delete the reference |
| SKILL.md section never referenced in any run | Delete that section |
These thresholds are starting points. Tune them based on the cost profile: small references with cheap loads tolerate lower load rates than large ones.
When choosing between architectures (monolithic vs. split, or split A vs. split B):
A split that saves 15% tokens but adds variance in output quality is worse than the monolith. Reliability beats efficiency.
run_loop.py from the skill-creator skill).The split that looked clean at design time rarely matches real load patterns. Trust the transcripts over your intuitions.
When asked to advise on a specific skill's organization: