Review Llm Artifacts

Dev Tools

Detects common LLM coding agent artifacts across four categories (tests, dead code, abstraction, style) over the project or changed files — using parallel subagents when the agent supports them, otherwise four sequential passes. Scans files changed since main by default; use --all for full-project scan. Triggers on LLM cruft cleanup, agent-generated code review, dead code sweeps, test-quality passes, or when the user asks to scan the whole repo.

Install

openclaw skills install review-llm-artifacts

LLM Artifacts Review

Detect common artifacts left behind by LLM coding agents: over-abstraction, dead code, DRY violations in tests, verbose comments, and defensive overkill.

Hard gates (sequence)

Advance only when each pass condition is objectively true (prevents “review complete” without artifacts):

Gate	Pass condition
G1 — Scope	File list is non-empty or you exit with exactly the Step 1 message; `scope` is set to `all` or `changed`.
G2 — Four categories	Tests, dead code, abstraction, and style are each reviewed (four parallel subagent runs when supported, or four sequential passes covering the same categories). Stop if any category did not complete; do not write JSON or a summary that implies a full pass.
G3 — JSON before summary	`.beagle/llm-artifacts-review.json` exists and is valid JSON before Step 6 markdown.
G4 — Integrity	Step 7 checks pass before treating the run as complete.

Arguments

Parse $ARGUMENTS for flags and optional path:

Flag	Effect
(default)	Changed-files scope — only files changed since `git merge-base HEAD main` (PR-style scope)
`--all`	Full project scan — all matching source files under the target path
`--parallel`	Force parallel execution where subagents are supported (default when 4+ files in scope)
Path	Root directory to scan (default: current working directory)

Step 1: Determine Scope

A. Changed files only (default):

Resolve the base ref explicitly and fail loudly if none exists — do not wrap the git merge-base call in || true, which would silently swallow a missing main/master ref and report "no files to scan" on repos that only have origin/main or use master. If no base ref is found, suggest the user pass --all instead of silently falling back.

BASE=$(for ref in main origin/main master origin/master; do
         git rev-parse --verify "$ref" >/dev/null 2>&1 && { echo "$ref"; break; }
       done)
if [ -z "$BASE" ]; then
  echo "error: no main/master ref found (checked main, origin/main, master, origin/master). Pass --all for a full-project scan." >&2
  exit 1
fi
MERGE_BASE=$(git merge-base HEAD "$BASE") || {
  echo "error: git merge-base HEAD $BASE failed." >&2
  exit 1
}
git diff --name-only "$MERGE_BASE..HEAD" | grep -E '\.(py|ts|tsx|js|jsx|go|rs|java|rb|swift|kt)$' || true

(The trailing || true on the grep is intentional — zero source-file matches is a legitimate empty-scope result, distinct from a failed base-ref resolution.)

B. Full project (--all):

From TARGET (default .), list source files and prune excluded dependency/build trees so find never descends into them. ! -path "*/foo/*" only filters the output; find still walks the tree (minutes of wasted I/O on large node_modules, target, etc.). Use -prune instead:

find "$TARGET" \
  \( -type d \( \
       -name node_modules -o -name .git -o -name vendor -o -name __pycache__ \
    -o -name .venv        -o -name venv -o -name dist   -o -name build \
    -o -name target       -o -name .next -o -name coverage -o -name .turbo \
  \) -prune \) -o \
  \( -type f \( \
       -name "*.py" -o -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" \
    -o -name "*.go" -o -name "*.rs" -o -name "*.java" -o -name "*.rb" \
    -o -name "*.swift" -o -name "*.kt" \
  \) -print \)

Large repos: The --all path can produce huge file lists. If file count exceeds 400, warn and suggest narrowing: pass a subdirectory as TARGET, or drop --all to fall back to the default changed-files scope. Still proceed unless the user explicitly cancels. (This warning does not fire on the default changed-files scope, which is already bounded by the PR diff.)

If no files are found, exit with:

No files to scan. Check the path, branch, or pass --all for a full-project scan.

Set scope in the report: "all" for --all, "changed" for the default changed-files scope.

Step 2: Detect Languages

Extract unique file extensions from the file list:

echo "$FILES" | sed 's/.*\.//' | sort -u

Map extensions to language names for the report:

.py -> Python
.ts, .tsx -> TypeScript
.js, .jsx -> JavaScript
.go -> Go
.rs -> Rust
.java -> Java
.rb -> Ruby
.swift -> Swift
.kt -> Kotlin

Step 3: Review the Four Categories

Cover all four categories below. If the agent supports subagents and file count >= 4 (or --parallel is set), dispatch one subagent per category in parallel. Otherwise, run the four category reviews sequentially yourself, producing the same findings. Either way:

Load the llm-artifacts-detection skill
Review each category (one per subagent when parallel, one pass at a time when sequential)
Collect findings in the structured format below

Category 1: Tests

Focus: Testing anti-patterns from LLM generation

DRY violations (repeated setup code, duplicate assertions)
Testing library/framework code instead of application logic
Wrong mock boundaries (mocking too much or too little)
Overly verbose test names that describe implementation
Tests that just mirror the implementation

Category 2: Dead Code

Focus: Unused or obsolete code

Unused imports, variables, functions, classes
TODO/FIXME comments that should have been resolved
Backwards compatibility code for removed features
Orphaned test files for deleted code
Commented-out code blocks
Feature flags that are always on/off

Category 3: Abstraction

Focus: Over-engineering patterns

Unnecessary abstraction layers (interfaces for single implementations)
Copy-paste drift (similar code that diverged slightly)
Over-configuration (configurable things that never change)
Premature generalization
Factory/Builder patterns for simple object creation
Deep inheritance hierarchies

Category 4: Style

Focus: Verbose or defensive patterns

Verbose comments explaining obvious code
Defensive overkill (null checks on non-nullable values)
Unnecessary type hints (dynamic languages with obvious types)
Overly explicit error messages
Redundant logging
Self-documenting code with documentation

Step 4: Consolidate Findings

Prerequisite: G2 satisfied (all four category reviews finished successfully).

Once all four category reviews have completed (parallel subagents or sequential passes), then:

Merge all findings into a single list
Assign unique IDs (1, 2, 3...)
Group by category for display

Echo before write (anti-confabulation): Every finding written to JSON MUST come from a category review's [FILE:LINE] ISSUE_TITLE output, not from the branch name, directory, or your own inference. After assigning ids, echo the consolidated table — id | category | file:line | description — and confirm each row traces to a specific category result. Do not add findings that no category review reported.

ID lock: Ids are contiguous 1..N with no gaps or duplicates. This 1..N set is the locked id set that downstream skills (verify-llm-artifacts, fix-llm-artifacts) bind to 1:1. summary.total MUST equal N, and summary.by_category counts MUST sum to N. State the id set before writing JSON.

Step 5: Write JSON Report

Create .beagle directory if it doesn't exist:

mkdir -p .beagle

Write findings to .beagle/llm-artifacts-review.json:

{
  "version": "1.0.0",
  "created_at": "2024-01-15T10:30:00Z",
  "git_head": "abc1234",
  "scope": "all" | "changed",
  "target": ".",
  "files_scanned": 42,
  "languages": ["Python", "TypeScript", "Go"],
  "findings": [
    {
      "id": 1,
      "category": "tests" | "dead_code" | "abstraction" | "style",
      "type": "dry_violation" | "unused_import" | "over_abstraction" | "verbose_comment" | "...",
      "file": "src/utils/helper.py",
      "line": 42,
      "description": "Repeated setup code in 5 test functions",
      "suggestion": "Extract to a pytest fixture",
      "risk": "Low" | "Medium" | "High",
      "fix_safety": "Safe" | "Needs review",
      "fix_action": "refactor" | "delete" | "simplify" | "extract"
    }
  ],
  "summary": {
    "total": 15,
    "by_category": {
      "tests": 4,
      "dead_code": 5,
      "abstraction": 3,
      "style": 3
    },
    "by_risk": {
      "High": 2,
      "Medium": 8,
      "Low": 5
    },
    "by_fix_safety": {
      "Safe": 10,
      "Needs review": 5
    }
  }
}

Step 6: Display Summary

Prerequisite: G3 satisfied (JSON on disk and parseable).

## LLM Artifacts Review

**Scope:** Changed files since merge-base with main | Entire project under `<path>` (when `--all`)
**Files scanned:** 42
**Languages:** Python, TypeScript, Go

### Findings by Category
...
### Summary Table
...
### Next Steps

- Run the [verify-llm-artifacts](../verify-llm-artifacts/SKILL.md) skill to confirm findings and drop false positives before fixing.
- Run the [fix-llm-artifacts](../fix-llm-artifacts/SKILL.md) skill after verification (or to preview safe-only fixes).
- Review the JSON report at `.beagle/llm-artifacts-review.json`

Step 7: Verification (report integrity)

Before completing, verify the review executed correctly:

JSON validity: Confirm .beagle/llm-artifacts-review.json exists and is parseable
Category coverage: All 4 category reviews completed without errors (parallel subagents or sequential passes)
Git HEAD captured: The git_head field is non-empty in the report
Staleness check: If a previous report exists, compare stored git_head to current HEAD and warn if different
ID + count integrity: Finding ids are contiguous 1..N; summary.total == N; summary.by_category sums to N. A mismatch means a finding was added, dropped, or duplicated — fix before completing.

python3 -c "import json; json.load(open('.beagle/llm-artifacts-review.json'))" 2>/dev/null && echo "✓ Valid JSON" || echo "✗ Invalid JSON"

python3 - <<'PY'
import json
r = json.load(open('.beagle/llm-artifacts-review.json'))
ids = [x['id'] for x in r['findings']]
n = len(ids)
ok = ids == list(range(1, n + 1)) and r['summary']['total'] == n \
     and sum(r['summary']['by_category'].values()) == n
print("✓ ids 1..N and counts consistent" if ok else f"✗ id/count mismatch: ids={ids} total={r['summary']['total']}")
PY

STORED_HEAD=$(jq -r '.git_head' .beagle/llm-artifacts-review.json 2>/dev/null)
CURRENT_HEAD=$(git rev-parse --short HEAD)
if [ "$STORED_HEAD" != "$CURRENT_HEAD" ]; then
  echo "⚠️ Report was generated on $STORED_HEAD, current HEAD is $CURRENT_HEAD"
fi

If any verification fails, report the error and do not proceed.

Finding-level verification (precision, not JSON syntax) is a separate skill: verify-llm-artifacts — run it before mass deletes or --fix on risky items.

Output Format for Each Finding

[FILE:LINE] **ISSUE_TYPE** (Risk, Fix Safety)
- Description
- Suggestion: Specific fix recommendation

Rules

Follow Hard gates order; do not skip G3 (JSON before Step 6).
Anti-confabulation: every finding must trace to a category review's [FILE:LINE] output (Step 4 echo); never invent findings from the branch name, directory, or inference. See the review-verification-protocol skill → Anti-confabulation (gate 0).
Always load the llm-artifacts-detection skill first
Use parallel subagents (when the agent supports them) for the four category reviews when >= 4 files; otherwise run them sequentially
Every finding MUST have file:line reference
Categorize risk honestly (don't inflate or deflate)
Mark fix safety as "Safe" only if change is mechanical and reversible
Create .beagle directory if needed
Write JSON report before displaying summary
Default scope is changed files since merge-base with main; pass --all for a full-project scan