Security audit

Chapter Briefs

Security checks across malware telemetry and agentic risk

Overview

The advertised chapter-brief helper is narrow, but the installed package also contains unrelated routeable research pipelines and workflow-control tooling that need review before use.

Install only if you intentionally want this broader research-pipeline toolkit, not just the chapter-brief helper. If you only need chapter briefs, use the documented scripts/run.py path in a controlled workspace and avoid enabling or routing the bundled pipeline files until they are separated, scoped, or reviewed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (19)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: log_path = workspace / log_rel try: completed = subprocess.run(cmd, check=False, capture_output=True, text=True) if completed.stdout or completed.stderr or completed.returncode != 0: ensure_dir(log_path.parent) body = [
Confidence: 94% confidence
Finding: completed = subprocess.run(cmd, check=False, capture_output=True, text=True)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill declares no explicit permissions, but its own instructions require reading local files, writing an output file, and invoking Python via shell commands. That mismatch creates a trust and enforcement gap: a host may treat the skill as low-privilege while it actually performs filesystem and command execution actions, increasing the risk of unintended file modification or abuse if the skill is repurposed or its inputs are adversarial.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The file defines a full `idea-brainstorm` pipeline even though the surrounding skill metadata claims the skill is `chapter-briefs`. This is a capability/intent mismatch: a caller expecting a constrained chapter-briefing skill could instead trigger literature retrieval, routing, checkpointing, and report-generation behavior outside the declared scope, undermining trust and policy enforcement.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: This section introduces materially broader capabilities than a chapter-briefs skill should have, including literature retrieval, human approval gates, memo synthesis, self-looping, and final report generation. In context, that means the skill can be used as a disguised multi-stage research pipeline, enabling scope escalation and unexpected artifact creation that operators and downstream policy may not anticipate.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The inline documentation explicitly states the pipeline is not for writing a survey draft, while the advertised `chapter-briefs` skill is for shaping survey chapters. This contradiction is dangerous because it signals that the loaded content does not match the user-facing contract, increasing the chance of misrouting, misuse, or hidden behavior under an innocuous label.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The file content materially contradicts the declared skill metadata: instead of a narrow chapter-brief generator, it defines a full systematic-review pipeline with unrelated artifacts, stages, and skills. This kind of skill/package mismatch is dangerous because routing, approval, and operator expectations are based on the advertised purpose, so a user invoking a harmless writing-planning skill could trigger much broader research and synthesis behavior.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The pipeline orchestrates retrieval, screening, extraction, and synthesis steps that far exceed the stated purpose of generating non-prose chapter briefs. This scope expansion can cause unauthorized workflow execution, unnecessary data handling, and misleading outputs, especially because the surrounding skill context promises a constrained, local, no-invention planning tool.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The documentation explicitly describes a PRISMA-style systematic review workflow and even marks Stage 5 as 'PROSE ALLOWED', directly conflicting with the skill's 'NO PROSE' guardrail and chapter-brief intent. Contradictory instructions weaken operator trust and make it more likely that the wrong capability is selected or that downstream controls are bypassed due to ambiguity.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: This file is not implementing a narrow chapter-brief builder; it is a general-purpose pipeline executor that mutates unit status, approvals, logs, checkpoints, and downstream task state. That scope mismatch is dangerous because users invoking a harmless-sounding writing skill may unknowingly grant it authority over the broader workflow state.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: Launching pipeline execution is unjustified for a local, no-network chapter-brief skill and materially expands its authority. The mismatch between declared purpose and actual behavior increases the risk of hidden execution paths, unintended file mutation, and abuse through crafted workspace state.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The code can automatically mark HUMAN checkpoints approved by editing DECISIONS.md when a checkpoint appears in auto_approve_set. This bypasses intended human review controls, enabling workflow escalation and unauthorized progression of tasks under the guise of a chapter-brief tool.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The code applies quality-gate, reroute, and cutover logic that governs the global pipeline rather than chapter-brief content generation. While less severe than direct execution or approval bypass, it still grants the skill authority to block, redirect, and shape unrelated workflow behavior.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This file implements a different capability than the declared skill: research ideation, direction ranking, memo generation, and report assembly rather than per-chapter H2 briefs. That kind of scope drift is dangerous because users and orchestrators may grant this skill trust, invocation conditions, and data access based on the chapter-briefs description while the code performs materially different work.

Intent-Code Divergence

High

Confidence: 95% confidence
Finding: The skill-level guardrail says 'NO PROSE', but these functions generate long-form markdown memos, appendices, rankings, and narrative summaries. Contradicting an explicit safety/behavioral constraint is risky because downstream systems may rely on the no-prose promise to control output shape, token budget, or review workflow, and the code silently violates that contract.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The code synthesizes, scores, ranks, and shortlists new research directions, which exceeds the stated purpose of building chapter briefs from existing mapped subsection and paper data. This is dangerous because it permits uncontrolled content invention and decision-shaping behavior in a workflow that explicitly says not to invent papers and is supposed to remain constrained to structural writing support.

Scope Creep

Medium

Confidence: 88% confidence
Finding: The file reads and writes a wide range of workspace artifacts such as pipeline specs, queries, briefs, reports, appendices, and trace outputs beyond what the skill metadata suggests. Broader-than-declared file-system interaction increases the risk of unauthorized data access, unexpected mutation of workspace state, and misleading operators about the skill's side effects.

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The pipeline is marked as `routing_default: true` and uses broad review-oriented `routing_hints` such as `survey`, `review`, and generic multilingual equivalents. In an agentic system that auto-selects skills from user prompts, this can cause the pipeline to trigger for loosely related requests, unexpectedly launching a large multi-stage workflow that produces many artifacts and may consume substantial local resources or alter workspace state.

Vague Triggers

Low

Confidence: 87% confidence
Finding: The routing hints include broad terms like 'peer review', 'review report', and 'referee', which can match many generic user requests unrelated to this specific pipeline. That increases the chance of misrouting and accidental activation, causing the agent to run the wrong workflow against user content or artifacts. In this context the impact is limited because the pipeline is offline and focused on document review, but it can still degrade reliability and lead to inappropriate processing of manuscript or workspace files.

Vague Triggers

Medium

Confidence: 83% confidence
Finding: The routing hint includes very broad terms like "tutorial" and "教程", which can match a wide range of ordinary user requests without strong invocation constraints. This increases the chance the pipeline is selected unintentionally, causing the agent to enter a multi-stage artifact-producing workflow when a simpler or safer skill should have handled the request.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.