# Skill Validator — Detailed Scoring Rubric

This document provides the complete, granular scoring rules for each validation dimension. Load this file when you need to apply precise scoring logic during a validation run.

---

## Table of Contents

1. [Dimension 1: Directory Structure (10 pts)](#dimension-1-directory-structure-10-pts)
2. [Dimension 2: Frontmatter Compliance (30 pts)](#dimension-2-frontmatter-compliance-30-pts)
3. [Dimension 3: Body Content Quality (25 pts)](#dimension-3-body-content-quality-25-pts)
4. [Dimension 4: Progressive Disclosure Design (15 pts)](#dimension-4-progressive-disclosure-design-15-pts)
5. [Dimension 5: Optional Directory Quality (10 pts)](#dimension-5-optional-directory-quality-10-pts)
6. [Dimension 6: Description Trigger Optimization (10 pts)](#dimension-6-description-trigger-optimization-10-pts)
7. [Grade Thresholds](#grade-thresholds)
8. [Edge Cases and Special Situations](#edge-cases-and-special-situations)

---

## Dimension 1: Directory Structure (10 pts)

### Check 1.1 — SKILL.md Exists (4 pts)

**How to check:** Look for a file named exactly `SKILL.md` (case-sensitive) in the skill root directory.

| Situation | Score |
|-----------|-------|
| `SKILL.md` exists in the skill root | 4 pts |
| `SKILL.md` exists but in a subdirectory (not root) | 1 pt — flag as warning |
| `skill.md` or `Skill.md` exists (wrong case) | 1 pt — flag as failure |
| No SKILL.md found anywhere | 0 pts — CRITICAL FAILURE, stop dimension scoring |

**Evidence to quote:** The file path found (or not found).

---

### Check 1.2 — Directory Name Matches `name` Field (3 pts)

**How to check:** Compare the parent directory name of SKILL.md with the `name` frontmatter value.

| Situation | Score |
|-----------|-------|
| Directory name exactly matches `name` field | 3 pts |
| Directory name matches but with different case (e.g., dir=`PDF-Tool`, name=`pdf-tool`) | 1 pt — flag as failure |
| Directory name is completely different from `name` | 0 pts — flag as failure |
| Cannot determine directory name (e.g., single file provided) | 2 pts — flag as warning, note limitation |

**Evidence to quote:** `Directory: "pdf-processing"`, `name field: "pdf-processing"` → Match ✅

---

### Check 1.3 — Optional Directories Used Appropriately (2 pts)

**How to check:** For each directory that exists, verify the content matches the directory's purpose.

**All standard directories** — do not flag these as unexpected:

| Directory | Purpose | Expected Content |
|-----------|---------|-----------------|
| `scripts/` | Executable code | `.py`, `.sh`, `.js` scripts |
| `references/` | Documentation | Markdown files loaded on demand |
| `assets/` | Static resources | Templates, images, data files |
| `evals/` | Test cases | `evals.json` and evaluation data (skill-creator workflow) |
| `agents/` | Subagent instructions | Markdown files for specialized subagents (e.g., grader.md, analyzer.md) |

**Wrong content examples:**

| Directory | Wrong Content |
|-----------|---------------|
| `scripts/` | Documentation, images, templates |
| `references/` | Scripts, binary files, templates |
| `assets/` | Scripts, documentation |

| Situation | Score |
|-----------|-------|
| All present dirs used correctly for their purpose | 2 pts |
| One dir has minor misuse (e.g., a README in scripts/) | 1 pt — flag as warning |
| One or more dirs clearly misused | 0 pts — flag as failure |
| No optional directories present | 2 pts (absence is fine) |

---

### Check 1.4 — No Unexpected/Suspicious Files (1 pt)

**How to check:** Scan all files for anything that seems out of place or potentially harmful.

| Situation | Score |
|-----------|-------|
| All files serve a legitimate, expected purpose | 1 pt |
| Unexpected files present but harmless (e.g., `.DS_Store`, `__pycache__/`) | 1 pt — note as minor warning |
| Files with suspicious names or content (e.g., `exploit.py`, `exfiltrate.sh`) | 0 pts — CRITICAL FAILURE, flag prominently |
| Files that contradict the skill's stated purpose | 0 pts — flag as failure |

---

## Dimension 2: Frontmatter Compliance (30 pts)

### Check 2.1 — `name` Field (10 pts)

Apply each sub-check independently. Start with 10 pts and deduct for failures.

| Sub-check | Points | Deduction if Failed |
|-----------|--------|---------------------|
| Field exists in frontmatter | 3 pts | -10 (if missing, score = 0 for this field) |
| Length is 1–64 characters | 2 pts | -2 |
| Only contains `[a-z0-9-]` | 2 pts | -2 |
| Does not start or end with `-` | 1 pt | -1 |
| Does not contain `--` | 1 pt | -1 |
| Matches parent directory name | 1 pt | -1 |

**Minimum score:** 0 (cannot go negative)

**How to check each rule:**
- **Length:** `len(name_value)` — must be 1 ≤ len ≤ 64
- **Character set:** regex `^[a-z0-9-]+$` — must match
- **No leading/trailing hyphen:** regex `^[^-].*[^-]$` or `^[^-]$` for single char
- **No consecutive hyphens:** `--` must not appear anywhere in the value
- **Directory match:** string equality comparison

**Evidence to quote:** The exact `name` value in quotes, e.g., `name: "my-skill"` (12 chars, valid pattern, matches directory "my-skill").

---

### Check 2.2 — `description` Field (10 pts)

Apply each sub-check independently. Start with 10 pts and deduct for failures.

| Sub-check | Points | Deduction if Failed |
|-----------|--------|---------------------|
| Field exists in frontmatter | 3 pts | -10 (if missing, score = 0 for this field) |
| Length is 1–1024 characters | 2 pts | -2 |
| Describes WHAT the skill does | 2 pts | -2 |
| Describes WHEN to use it | 2 pts | -2 |
| Contains specific trigger keywords | 1 pt | -1 |

**Minimum score:** 0

**How to check each rule:**

**Length:** Count characters. Flag if:
- Under 30 chars: almost certainly too vague (warning even if technically valid)
- 30–50 chars: likely too vague (warning)
- 51–1024 chars: valid range
- Over 1024 chars: spec violation (-2)

**WHAT check:** Does the description mention specific capabilities, actions, or outputs? Look for verbs describing what the skill does: "extracts", "generates", "validates", "converts", "analyzes", etc. A description that only names the skill ("A skill for PDFs") without describing capabilities fails this check.

**WHEN check:** Does the description include trigger conditions? Look for phrases like:
- "Use when..."
- "Use this when..."
- "TRIGGER when..."
- "Trigger when..."
- "when the user..."
- "when working with..."
- "when you need to..."

A description with no trigger guidance fails this check.

**Keywords check:** Are there specific, searchable terms that match real user requests? Generic terms ("things", "tasks", "content") don't count. Specific terms ("PDF", "Playwright", "MCP server", "brand colors", "JWT") do count.

**Evidence to quote:** The full description value (or first 200 chars if very long), with specific phrases highlighted.

---

### Check 2.3 — `license` Field (3 pts)

| Situation | Score |
|-----------|-------|
| Field absent | 3 pts (optional, absence is fine) |
| Field present with a recognizable license name (MIT, Apache-2.0, etc.) | 3 pts |
| Field present referencing a bundled file (e.g., "Complete terms in LICENSE.txt") | 3 pts |
| Field present but value is unclear or empty | 1 pt — flag as warning |
| Field present with full license text inline (hundreds of chars) | 1 pt — flag as warning, suggest moving to LICENSE.txt |

---

### Check 2.4 — `compatibility` Field (3 pts)

| Situation | Score |
|-----------|-------|
| Field absent | 3 pts (optional, absence is fine) |
| Field present, 1–500 chars, describes real requirements | 3 pts |
| Field present, 1–500 chars, but vague ("works everywhere") | 2 pts — flag as warning |
| Field present, over 500 chars | 1 pt — flag as failure |
| Field present but empty | 0 pts — flag as failure |

---

### Check 2.5 — `metadata` Field (2 pts)

| Situation | Score |
|-----------|-------|
| Field absent | 2 pts (optional, absence is fine) |
| Field present as a valid key-value mapping | 2 pts |
| Field present but as a list (not a mapping) | 0 pts — flag as failure |
| Field present but as a scalar string | 0 pts — flag as failure |

**Valid example:**
```yaml
metadata:
  author: my-org
  version: "1.0"
```

**Invalid examples:**
```yaml
metadata: some-string          # scalar, not mapping
metadata:
  - item1                      # list, not mapping
  - item2
```

---

### Check 2.6 — `allowed-tools` Field (2 pts)

| Situation | Score |
|-----------|-------|
| Field absent | 2 pts (optional, absence is fine) |
| Field present as space-delimited list of valid tool names | 2 pts |
| Field present but as a YAML list (not space-delimited string) | 1 pt — flag as warning |
| Field present but tool names don't follow expected format | 1 pt — flag as warning |

**Valid format:** `allowed-tools: Bash(git:*) Bash(jq:*) Read Write`
**Invalid format:** 
```yaml
allowed-tools:
  - Bash(git:*)
  - Read
```

---

## Dimension 3: Body Content Quality (25 pts)

### Check 3.1 — Has Substantive Content (5 pts)

**How to check:** Read the body (everything after the closing `---` of frontmatter). Assess whether it contains real instructions.

| Situation | Score |
|-----------|-------|
| Body has multiple paragraphs of real instructions | 5 pts |
| Body has some content but is very thin (1-2 sentences) | 2 pts — flag as warning |
| Body is just a title with no content | 1 pt — flag as failure |
| Body is empty or only whitespace | 0 pts — flag as failure |

**Evidence:** Quote the first few lines of the body.

---

### Check 3.2 — Body Length (5 pts)

**How to check:** Count the total lines in the body (excluding frontmatter).

| Line Count | Score | Status |
|------------|-------|--------|
| 1–500 lines | 5 pts | ✅ |
| 501–600 lines | 3 pts | ⚠️ Approaching limit |
| 601–800 lines | 1 pt | ❌ Should split content |
| 801+ lines | 0 pts | ❌ Violates progressive disclosure |

**Evidence:** "Body is X lines."

---

### Check 3.3 — Includes Step-by-Step Instructions or Clear Workflow (5 pts)

**How to check:** Look for structured workflow content. This can be:
- Numbered steps (1. Do X, 2. Do Y)
- Ordered sections (Step 1, Step 2, Phase 1, Phase 2)
- A decision tree or flowchart
- A clear sequence of actions

| Situation | Score |
|-----------|-------|
| Clear step-by-step workflow or decision tree present | 5 pts |
| Some structure but not a complete workflow | 3 pts — flag as warning |
| Only descriptive content, no actionable steps | 1 pt — flag as failure |
| No structured content at all | 0 pts — flag as failure |

---

### Check 3.4 — Uses Imperative Form (3 pts)

**How to check:** Sample 10–20 instruction sentences from the body. Count how many start with an imperative verb vs. passive/descriptive phrasing.

**Imperative verbs (good):** Read, Run, Check, Use, Create, Write, Ensure, Avoid, Load, Parse, Generate, Output, Return, Save, Validate, Compare, List, Fetch, Scan, Apply, etc.

**Passive/descriptive (bad):** "The skill will...", "Claude should...", "This section describes...", "It is recommended that...", "You might want to..."

| Ratio of imperative sentences | Score |
|-------------------------------|-------|
| >70% imperative | 3 pts |
| 50–70% imperative | 2 pts |
| 30–50% imperative | 1 pt |
| <30% imperative | 0 pts |

**Evidence:** Quote 2-3 example sentences, one good and one bad.

---

### Check 3.5 — Explains the "Why" (3 pts)

**How to check:** Look for explanatory language that gives rationale for instructions.

**Positive signals:**
- "because [reason]"
- "so that [outcome]"
- "this helps [goal]"
- "this ensures [property]"
- "the reason is [explanation]"
- "this prevents [problem]"
- "this allows [capability]"

**Negative signals:**
- Instructions with no rationale
- Heavy use of "MUST", "ALWAYS", "NEVER" without explanation
- Rules that feel arbitrary (no context given)

**All-caps command word frequency check (from skill-creator):**
Count occurrences of `ALWAYS`, `NEVER`, `MUST`, `DO NOT`, `NEVER EVER` used as standalone directives (not as part of a quoted example or code). Calculate the rate per 100 lines of body content.

| All-caps rate (per 100 lines) | Signal |
|-------------------------------|--------|
| 0–2 occurrences | Normal — no issue |
| 3–5 occurrences | ⚠️ Yellow flag — check if rationale is provided alongside each |
| 6+ occurrences | ❌ Failure — over-reliance on commands instead of explanation |

Note: A single `ALWAYS` with a clear "because" clause is fine. The problem is a pattern of bare commands with no reasoning — it suggests the skill is trying to force behavior rather than help Claude understand the task.

| Situation | Score |
|-----------|-------|
| Multiple instructions include clear rationale | 3 pts |
| Some rationale present but inconsistent | 2 pts |
| Very little rationale, mostly bare commands | 1 pt |
| No rationale at all, pure command list | 0 pts |

---

### Check 3.6 — Defines Output Format (4 pts)

**How to check:** Look for any of the following:
- An explicit output template (e.g., "Use this exact format: ...")
- A Markdown example showing expected output
- A schema or structure definition
- An example of what the final result should look like

| Situation | Score |
|-----------|-------|
| Explicit output template or schema provided | 4 pts |
| Example output shown but no formal template | 3 pts |
| Output format described in prose but no example | 2 pts |
| No output format guidance at all | 0 pts |

---

## Dimension 4: Progressive Disclosure Design (15 pts)

### Check 4.1 — Metadata Tier is Concise (3 pts)

**How to check:** Estimate the word count of `name` + `description` combined. A rough estimate: 1 word ≈ 5 characters. The spec (via skill-creator) targets ~100 words for the entire metadata tier.

| Description Word Count | Approx. Characters | Score |
|------------------------|-------------------|-------|
| Under 100 words (~500 chars) | Under 500 chars | 3 pts |
| 100–150 words (~500–750 chars) | 500–750 chars | 2 pts — acceptable but slightly heavy |
| 150–200 words (~750–1000 chars) | 750–1000 chars | 1 pt — flag as warning |
| Over 200 words (1000+ chars) | 1000+ chars | 0 pts — flag as failure |

**Note:** The description has a hard limit of 1024 chars, but for triggering efficiency, shorter is better. The goal is ~100 words for the entire metadata tier — this is what's always in context for every skill, so it should be lean.

---

### Check 4.2 — Instructions Tier is Appropriately Sized (4 pts)

**How to check:** Count SKILL.md body lines (same as Check 3.2, but scored separately for architectural reasons).

| Line Count | Score |
|------------|-------|
| Under 300 lines | 4 pts — excellent |
| 300–500 lines | 3 pts — good |
| 501–600 lines | 2 pts — acceptable |
| 601–800 lines | 1 pt — should refactor |
| 800+ lines | 0 pts — architectural failure |

---

### Check 4.3 — Large Reference Material in references/ (4 pts)

**How to check:** Look for large inline content in SKILL.md that should be in references/:
- Tables over 30 rows
- Code blocks over 50 lines
- Specification text that's reference material, not instructions
- Domain-specific documentation (e.g., AWS vs GCP vs Azure guides all inline)

| Situation | Score |
|-----------|-------|
| No large inline reference material (or it's appropriately sized) | 4 pts |
| Some large inline content but SKILL.md is still under 500 lines | 3 pts — minor warning |
| Large inline content pushing SKILL.md over 500 lines | 1 pt — flag as failure |
| Massive inline reference content (SKILL.md is 800+ lines because of it) | 0 pts — architectural failure |

---

### Check 4.4 — Reusable Scripts in scripts/ (4 pts)

**How to check:** Look for large code blocks in SKILL.md that could be scripts.

**Signals that code should be in scripts/:**
- Code block is 30+ lines
- The same code pattern appears multiple times
- The code is a complete, runnable script (not just a snippet)
- The skill-creator notes: "if all 3 test cases resulted in the subagent writing a `create_docx.py`, that's a strong signal the skill should bundle that script"

| Situation | Score |
|-----------|-------|
| No large inline code blocks (or skill doesn't need scripts) | 4 pts |
| Small code snippets inline (under 30 lines each) | 4 pts — appropriate |
| One large code block (30–80 lines) inline | 2 pts — flag as warning |
| Multiple large code blocks or one 80+ line block inline | 0 pts — flag as failure |

---

## Dimension 5: Optional Directory Quality (10 pts)

**Important:** Only score directories that exist. If a directory doesn't exist, award full points for it.

### scripts/ (3 pts, if present)

#### Check 5.1 — Self-contained or Documents Dependencies (1 pt)

| Situation | Score |
|-----------|-------|
| Scripts import only standard library modules | 1 pt |
| Scripts import third-party modules AND document them (requirements.txt, comments, --help) | 1 pt |
| Scripts import third-party modules with no documentation | 0 pts |

#### Check 5.2 — Helpful Error Messages / --help (1 pt)

| Situation | Score |
|-----------|-------|
| Scripts have `--help` flag or clear usage documentation | 1 pt |
| Scripts have some error handling but no --help | 0.5 pts |
| Scripts have no error handling or usage documentation | 0 pts |

#### Check 5.3 — Handles Edge Cases (1 pt)

| Situation | Score |
|-----------|-------|
| Scripts handle missing files, bad input, network errors, etc. | 1 pt |
| Scripts have basic error handling (try/except) | 0.5 pts |
| Scripts assume happy path only | 0 pts |

---

### references/ (4 pts, if present)

#### Check 5.4 — Each File is Focused (2 pts)

**How to check:** Read each reference file. Does it cover one topic or many?

| Situation | Score |
|-----------|-------|
| Each file covers a single, well-defined topic | 2 pts |
| Files are somewhat focused but have some scope creep | 1 pt |
| Files are catch-all documents covering many unrelated topics | 0 pts |

#### Check 5.5 — Files Under 300 Lines (1 pt)

| Situation | Score |
|-----------|-------|
| All reference files are under 300 lines | 1 pt |
| One file is 300–500 lines (with table of contents) | 0.5 pts |
| Any file is over 500 lines without a table of contents | 0 pts |

#### Check 5.6 — Clearly Referenced from SKILL.md (1 pt)

| Situation | Score |
|-----------|-------|
| SKILL.md explicitly references each file with guidance on when to read it | 1 pt |
| SKILL.md mentions the references/ directory but not specific files | 0.5 pts |
| Reference files exist but are never mentioned in SKILL.md | 0 pts |

---

### assets/ (3 pts, if present)

#### Check 5.7 — Appropriate Content (2 pts)

| Situation | Score |
|-----------|-------|
| Assets are templates, images, data files, or schemas | 2 pts |
| Assets include some inappropriate content (e.g., scripts in assets/) | 1 pt |
| Assets are clearly wrong type for this directory | 0 pts |

#### Check 5.8 — Assets Are Referenced (1 pt)

| Situation | Score |
|-----------|-------|
| All assets are referenced or used by the skill | 1 pt |
| Some assets are referenced, some are not | 0.5 pts |
| No assets are referenced in SKILL.md | 0 pts |

---

## Dimension 6: Description Trigger Optimization (10 pts)

### Check 6.0 — Pre-check: "When to Use" Belongs in Description, Not Body (Warning only, no point deduction)

**How to check:** Scan the SKILL.md body for section headings that indicate trigger/activation guidance:
- "When to Use"
- "When to Use This Skill"
- "Trigger Conditions"
- "When This Skill Applies"
- "When Should This Skill Activate"
- "Use Cases" (if it's describing when to invoke, not what the skill produces)

**Why this matters:** From skill-creator: "All 'when to use' info goes here [in description], not in the body." The description is what Claude reads to decide whether to activate a skill. If trigger conditions are buried in the body, Claude may not see them at activation time — they only help once the skill is already running. Moving this content to the description directly improves triggering accuracy.

| Situation | Action |
|-----------|--------|
| No "when to use" section in body | No action needed |
| Body has a "when to use" section AND description also covers it | ⚠️ Warning: suggest removing the body section (redundant) |
| Body has a "when to use" section AND description lacks trigger conditions | ⚠️ Warning: suggest moving the content to description |

This is a warning, not a point deduction — the skill still works, but it's not optimally structured.

---

### Check 6.1 — Explicit Trigger Conditions (3 pts)

**How to check:** Look for explicit "when to use" language in the description.

**Strong trigger language:**
- "Use when..."
- "Use this when..."
- "TRIGGER when..."
- "Trigger when..."
- "Use whenever..."
- "Activate when..."

**Weak or absent trigger language:**
- No "when" clause at all
- Only "Use for X" without specifying conditions
- "Can be used for..." (too passive)

| Situation | Score |
|-----------|-------|
| Clear, explicit trigger conditions with specific scenarios | 3 pts |
| Some trigger guidance but vague ("use when working with X") | 2 pts |
| Minimal trigger guidance (only implied by what the skill does) | 1 pt |
| No trigger conditions at all | 0 pts |

---

### Check 6.2 — Diverse Trigger Keywords (3 pts)

**How to check:** Extract all specific nouns, verbs, and phrases from the description that could match user queries. Count distinct keyword clusters.

**Keyword clusters (examples):**
- PDF skill: "PDF", "forms", "document extraction", "merge", "fill form"
- Testing skill: "test", "Playwright", "browser", "UI", "frontend", "screenshot"
- Brand skill: "brand", "colors", "typography", "style guidelines", "visual formatting"

| Situation | Score |
|-----------|-------|
| 5+ distinct keyword clusters covering different phrasings | 3 pts |
| 3–4 keyword clusters | 2 pts |
| 1–2 keyword clusters | 1 pt |
| No specific keywords (only generic terms) | 0 pts |

---

### Check 6.3 — Appropriate Specificity (2 pts)

**How to check:** Assess whether the description would trigger on unrelated tasks (too broad) or miss relevant tasks (too narrow).

**Too broad examples:**
- "Use for any task involving documents" — would trigger on unrelated document tasks
- "Use whenever the user needs help" — would trigger on everything
- "A general-purpose skill" — meaningless

**Too narrow examples:**
- "Use only when the user says 'validate my PDF form'" — too specific
- "Use for processing the Q4-2024-report.pdf file" — absurdly specific

**Just right:**
- Covers the skill's actual use cases
- Doesn't claim to cover things it can't do
- Doesn't exclude valid use cases

| Situation | Score |
|-----------|-------|
| Specificity is well-calibrated to the skill's actual capabilities | 2 pts |
| Slightly too broad or too narrow but still reasonable | 1 pt |
| Clearly too broad (would trigger on unrelated tasks) | 0 pts |
| Clearly too narrow (would miss many valid use cases) | 0 pts |

---

### Check 6.4 — Appropriate "Pushiness" (2 pts)

**How to check:** Does the description actively encourage triggering, or is it passive?

**The undertriggering problem (from skill-creator):** Claude tends to not use skills when they'd be useful. Descriptions should be slightly assertive to counteract this.

**Passive (undertriggers):**
- "A tool for PDF processing."
- "Helps with brand guidelines."
- "Can be used for web testing."

**Appropriately pushy:**
- "Use whenever the user mentions PDFs, forms, or document extraction — even if they don't explicitly ask for a 'PDF skill'."
- "Make sure to use this skill whenever brand colors, style guidelines, or visual formatting apply."
- "TRIGGER when: code imports `anthropic`/`@anthropic-ai/sdk`, or user asks to use Claude API."

| Situation | Score |
|-----------|-------|
| Description actively encourages triggering with "even if", "whenever", "make sure to use" | 2 pts |
| Description has some assertive language but could be stronger | 1 pt |
| Description is purely passive/descriptive | 0 pts |

---

## Grade Thresholds

| Total Score | Grade | Meaning | Recommended Action |
|-------------|-------|---------|-------------------|
| 90–100 | **Excellent** | Fully compliant, production-ready | Minor polish only |
| 75–89 | **Good** | Meets spec, minor improvements recommended | Address medium-priority items |
| 60–74 | **Acceptable** | Meets minimum spec requirements | Address high and medium priority items |
| 40–59 | **Poor** | Significant spec violations | Rework required before publishing |
| 0–39 | **Critical** | Does not meet spec | Major rewrite needed |

### Status Icons for Score Summary Table

Use these icons in the score summary table based on percentage of max points earned:

| Percentage of Max | Icon |
|-------------------|------|
| ≥ 80% | ✅ |
| 50–79% | ⚠️ |
| < 50% | ❌ |

---

## Edge Cases and Special Situations

### Single-File Skills (No Directory)

When the user provides only a SKILL.md file without a directory:
- **Dimension 1:** Award 4 pts for SKILL.md existing. For Check 1.2 (directory match), award 2 pts and note "Cannot verify directory name — please confirm the skill directory is named to match the `name` field."
- **Dimensions 3–6:** Score normally based on file content.
- **Dimension 5:** Award full points (no optional directories to check).

### GitHub URL Input

When the user provides a GitHub URL:
1. Fetch the directory listing to discover all files
2. Fetch each file's content
3. The "directory name" is the last path segment of the URL
4. Score normally

### Composite Repository (Multiple Skills)

When the repository contains multiple skills (e.g., `skills/pdf/`, `skills/docx/`):
1. Identify all skill directories (each containing a SKILL.md)
2. Validate each skill separately
3. Generate a combined report with:
   - Individual scores for each skill
   - A summary table comparing all skills
   - Overall repository health assessment

### Malformed YAML Frontmatter

When the frontmatter cannot be parsed as valid YAML:
- Flag as CRITICAL FAILURE in Dimension 2
- Attempt to extract values using regex as a fallback
- Note what could and couldn't be parsed
- Score conservatively (assume missing fields are absent)

### Very Large Skills

When a skill has 1000+ lines in SKILL.md:
- Flag as architectural failure
- Still read and score all content
- Provide specific suggestions for what to move to references/ or scripts/

### Skills with Non-Standard Files

When a skill contains files not in the standard directories (e.g., a `LICENSE.txt` at root, a `README.md`, a `CHANGELOG.md`):
- These are generally fine — note them but don't penalize
- Only penalize if they seem to violate the "principle of least surprise"

### Empty Optional Directories

When `scripts/`, `references/`, or `assets/` exists but is empty:
- Flag as a minor warning (why create an empty directory?)
- Award full points for that directory (no content to penalize)
- Suggest either adding content or removing the directory