Hermes Agent Health Check

v1.1.2

Audit a NousResearch/hermes-agent checkout or fork for Hermes-specific runtime-contract drift, command-surface splits, memory/skill/gateway health, and agent...

0· 57· 1 versions· 0 current· 0 all-time· Updated 2d ago· MIT-0

Hermes Agent Health Check

Audit the architecture and health of a Hermes Agent checkout, fork, or deployment support repo.

Hermes Agent has a connected runtime: agent loop, command registry, CLI, TUI, gateway, skills, memory, cron, tools, plugins, and terminal environments. hermescheck helps keep those surfaces aligned.

When to Use

  • You are preparing a Hermes Agent PR and want a repeatable architecture review
  • A Hermes fork works in CLI but not gateway, TUI, skills, cron, or plugins
  • A new slash command risks drifting across surfaces
  • A tool or environment change needs clearer capability boundaries
  • Memory, session search, or skill behavior regressed after a refactor
  • Startup paths or background jobs became hard to reason about

Quick Start

pip install hermescheck
hermescheck /path/to/hermes-agent

Produces audit_results.json and audit_report.md.

The 12-Layer Stack

#LayerWhat Goes Wrong
1System promptConflicting instructions, instruction bloat
2Session historyStale context from previous turns
3Long-term memoryPollution across sessions
4DistillationCompressed artifacts re-entering as pseudo-facts
5Active recallRedundant re-summary layers wasting context
6Tool selectionWrong tool routing, model skips required tools
7Tool executionHallucinated execution — claims to call but doesn't
8Tool interpretationMisread or ignored tool output
9Answer shapingFormat corruption in final response
10Platform renderingUI/API/CLI mutates valid answers
11Hidden repair loopsSilent fallback/retry agents running second LLM pass
12PersistenceExpired state or cached artifacts reused as live evidence

Audit Scanners

#ScannerSeverityWhat It Catches
1Hardcoded SecretscriticalAPI keys, tokens, credentials in source code
2Tool Enforcement Gaphigh"Must use tool X" in prompt but no code validation
3Hidden LLM CallshighSecret second-pass LLM calls in fallback/repair loops
4Unrestricted Code Executioncriticalexec(), eval(), subprocess(shell=True) without sandbox
5Static Bug InferencehighCode-level bug patterns inferred without runtime execution
6Token Usage BudgethighLarge default context windows, full-history prompts, missing thrift controls
7Memory Lifecycle GovernancemediumMemory without types, lifecycle, retrieval budgets, decay, or evidence pointers
8RAG Pipeline GovernancemediumRetrieval without chunk, top-k, rerank, ingestion, or context budget controls
9Self-Evolution CapabilityhighLearning loops without external signals, source reading, constraint fit, safe landing, or verification
10Loop Safety BudgethighTool/agent loops without max-iteration, retry budget, stuck-job, or duplicate-call controls
11Plugin / Remote Tool BoundaryhighExecutable plugins and MCP/OpenAPI tools without sandbox, schema, allowlist, or approval boundaries
12Output Pipeline MutationmediumResponse transformation corrupting correct answers
13Missing ObservabilitymediumNo tracing, logging, cost tracking, or audit trail

Severity Model

LevelMeaning
criticalAgent can confidently produce wrong operational behavior
highAgent frequently degrades correctness or stability
mediumCorrectness usually survives but output is fragile or wasteful
lowMostly cosmetic or maintainability issues

Fix Strategy

Default fix order (code-first, not prompt-first):

  1. Code-gate tool requirements — enforce in code, not just prompt text
  2. Remove or narrow hidden repair agents — make fallback explicit with contracts
  3. Reduce context duplication — same info through prompt + history + memory + distillation
  4. Tighten memory admission — user corrections > agent assertions
  5. Tighten distillation triggers — don't compress what shouldn't be compressed
  6. Reduce rendering mutation — pass-through, don't transform
  7. Convert to typed JSON envelopes — structured internal flow, not freeform prose

Report Schema

Reports follow a formal JSON Schema (see references/report-schema.json) with:

  • overall_health: critical_risk | high_risk | medium_risk | low_risk
  • findings: array of severity-ranked issues with evidence refs
  • maturity_score: positive signal ledger, penalty ledger, score formula, and expected recovery directions
  • ordered_fix_plan: prioritized fix steps with rationale

Anti-Patterns to Avoid

  • ❌ Saying "the model is weak" without falsifying the wrapper first
  • ❌ Saying "memory is bad" without showing the contamination path
  • ❌ Letting a clean current state erase a dirty historical incident
  • ❌ Treating markdown prose as a trustworthy internal protocol
  • ❌ Accepting "must use tool" in prompt text when code never enforces it

Related

Version tags

agent-auditvk972bcszmsgzdc3wy8z1kng5wx85kq6hhermes-agentvk972bcszmsgzdc3wy8z1kng5wx85kq6hlatestvk972bcszmsgzdc3wy8z1kng5wx85kq6h