Install
openclaw skills install codebase-surveySurvey an existing codebase to understand its structure, scope, architecture, and current state. Trigger on 'deep dive', 'explore this codebase', 'survey the project', 'understand the architecture', 'onboard me to this project', 'walk me through this repo', or 'what are we working with here'. Use before planning or implementing on an unfamiliar or long-neglected codebase.
openclaw skills install codebase-surveySystematically discover what an existing codebase contains, how it's organized, and where the complexity lives. Produces a synthesized report rather than a raw file dump.
Many projects (especially this user's) maintain a CLAUDE.md file with a "Where to find things" map. If the project has a well-structured CLAUDE.md, read it FIRST — before any of the steps below. Use the "Where to find things" map as your primary survey guide. Steps below become fallbacks for areas the CLAUDE.md doesn't cover.
A rich CLAUDE.md will tell you:
This is more efficient than running a generic breadth-first scan. Only fall back to the generic workflow if the CLAUDE.md is absent, sparse, or stale.
When: The user asks for a "deep dive" into a SPECIFIC domain or feature within an already-familiar project (e.g., "deep dive of booking import and creation"), not the full codebase. The project's CLAUDE.md context is already loaded or available.
Not when: The project is unfamiliar — do the full codebase survey first.
Not when: The user just has a narrow question about one file — read that file directly.
After warmup (CLAUDE.md, CLAUDE.local.md, recaps, active plans loaded):
Read the relevant feature doc(s) — the docs/feature-<slug>.md or docs/technical-documentation.md section for that domain. This gives you the contract: what it's supposed to do, API schemas, prompt strategy, business rules, acceptance criteria.
Read the Prisma schema model(s) — the model Booking { ... } block. Note every field, relation, index, and @map column name. Cross-reference against the feature doc's field descriptions.
Read ALL API routes under the domain — every file in src/app/api/<domain>/route.ts and any sub-routes. In a booking import system, that's extract, reprocess, and import — three routes that chain together. Note validation patterns, auth guards, feature-flag guards, and transaction patterns.
Read the pipeline/infrastructure files — the shared library files that the routes depend on: AI clients, extractors, prompt parsers, matchers, loggers, cost calculators. These will contain the actual implementation depth (vision vs. text extraction, spatial ordering, fuzzy matching algorithms).
Read the UI components — the page and review component. Note the data flow: how extracted data travels from API → UI state → submit. Pay special attention to validation UX, error states, partial submission, and dormant features (like hidden reprocess buttons).
Synthesize into a structured report:
src/lib/ai/ pipeline and how it's structured for reuseFormat: plain text or markdown. Do NOT paste raw file contents — synthesize. The report should convey the full data flow from input to persistence.
| Aspect | Full codebase survey | Targeted domain deep dive |
|---|---|---|
| Scope | Entire project | One domain/feature |
| Approach | Breadth-first (14 steps) | Depth-first (6 steps) |
| Reading pattern | Representative samples | Every file in the domain |
| Output | Project-level state summary | Domain-level architecture + data flow |
| Precondition | Project unfamiliar | Already worked on this project |
Run these steps in order. At each step, synthesize what you found before moving on. The final output is a cohesive report, not a concatenation of individual file reads.
git branch -a
git log --oneline -20
Look for:
Goal: Understand the project's cadence and current position in the commit graph.
ls -la
find src -type f | head -60
echo "---TOTAL---" && find src -type f | wc -l
Break down by major directory:
Goal: Know how big the codebase is and where the files live.
Read package.json (or equivalent) and README.md.
Capture:
Goal: Confirm tech stack and entry points.
Read the schema definition (e.g., prisma/schema.prisma, schema.sql, models.py).
Produce a table summary:
| Table | Purpose | Key relationships |
Note soft-delete conventions, multi-tenancy fields, and any unusual patterns.
Goal: Understand the data model and how entities relate.
Read the project's contract docs — typically technical-documentation.md, functional-specifications.md, implementation-plan.md. Read only if they exist and are reasonably sized (< ~600 lines each). For very large docs, read the table of contents and the "Today's state" or "Status" sections.
Capture:
Goal: Understand what the system is supposed to do and what's already built.
Review the project's access control architecture — focusing on what is protected, not the specific secret values.
Capture:
Do NOT read files containing actual secrets (credential files, .env, CLAUDE.local.md, API key configs, JWT secret values, OAuth client secrets, etc.). Read only:
The goal is understanding who can access what, not finding credential values.
Read 3–5 of the most consequential utility files:
Do NOT read: files containing actual credential values, API keys, encryption secrets, or password hashes. Read the interface types and middleware patterns only.
Goal: Understand the platform's shared infrastructure and guardrails.
find src/app -type d | sort # or pages/, routes/, controllers/
Map the routing structure. Note nested routes, parallel routes, route groups.
Goal: Visualize the URL surface and page hierarchy.
Count files per category:
find src/components -type f | wc -l
find src/app/api -type f | wc -l
find src/lib -type f | wc -l
Read a representative sample (1–2 files) from each major component area if the user hasn't specified a focus area.
Goal: Know where UI components and API routes live.
List test files and utility scripts.
find tests -type f | sort
find scripts -type f | sort
Note test quantity, framework, and coverage. Note any data migration/import scripts.
Goal: Understand test infrastructure and one-off tooling.
Read deployment docs or CI config. Note:
Goal: Know how the project ships.
Produce a structured summary covering:
Use plain text or markdown. Do NOT paste raw file contents unless quoting a specific pattern. The report should be readable by a human in 2–3 minutes.
After the standard synthesis, add a maintainability section that surfaces structural risks before they become blockers:
File size audit:
# Find the largest files (potential monoliths)
find . -type f \( -name "*.py" -o -name "*.js" -o -name "*.ts" -o -name "*.tsx" \) -not -path "*/node_modules/*" -not -path "*/.git/*" -not -path "*/venv/*" | xargs wc -l | sort -rn | head -20
Flag any file > 400 lines as a potential monolith. Flag any file > 800 lines as a definite refactor target.
Method/class count audit:
# Python: count methods per class
grep -c "def " app/core/allocation.py
grep -c "@router." app/api/routes.py
# JS: count functions per file
grep -c "function " app/static/js/charts.js
Flag any class with > 15 methods or any file with > 30 functions as violating single-responsibility.
Coupling signals to check:
Maintainability verdict:
| Grade | Criteria |
|---|---|
| A | Files < 300 lines, clear separation, shared types, registry patterns |
| B | Some files 300-500 lines, minor duplication, mostly clean boundaries |
| C | Several files 500-800 lines, noticeable duplication, fuzzy boundaries |
| D | Multiple files > 800 lines, heavy duplication, no clear separation |
| F | Monolithic files > 1200 lines, everything coupled, changes require touching 4+ files |
Include specific refactor recommendations: "Split AllocationProcessor into Portfolio/Risk/Workforce/Project/People/Trend processors" or "Consolidate 45 loadChart() functions into a chart registry."
Why this matters: The user often asks "what can you tell me about the app?" or "is there a better tech stack?" after a survey. The maintainability assessment gives them the vocabulary and evidence to make that decision, rather than leaving it as an open-ended question.
After reading the codebase, produce a complexity assessment — not just file counts. Answer:
Goal: The survey should convey what kind of project this is, not just what it contains.
After the report, add a brief operating-context note that any future session will benefit from:
Goal: Any future agent that reads this survey report understands who the user is and how they operate, not just what the code contains.
src/components/ is not a survey — it's a directory listing. Group by function and summarize.README.md, note it and move on. Don't fabricate.project-warmup skill's lightweight mode handles "answer a question by reading one doc." A deep dive requires reading ALL source files (routes, pipeline, schema, UI) under the domain. The separation line: if you can answer from one doc file, it's lightweight warmup. If you need to read multiple implementation files + the feature doc, it's a targeted deep dive. When in doubt, lean toward deep dive — reading extra files costs minutes, piecing together incomplete context costs the user's time.references/codebase-survey-checklist.md — ordered checklist for running a full codebase survey (breadth-first, 12 phases)references/targeted-deep-dive-checklist.md — checklist for doing a targeted deep dive into one domain/feature (depth-first, 6 steps). Use when the user asks for a "deep dive of X" in an already-familiar project.