Appendix Table Writer

Security checks across malware telemetry and agentic risk

Overview

The advertised appendix-table helper is mostly local, but the package also ships broad workflow pipelines and runner code that are not clearly disclosed by the skill name or manifest.

Review this as a bundled research-pipeline toolkit, not just an appendix table writer. The documented scripts/run.py path appears suitable for local table generation, but avoid enabling the bundled pipeline router or executor unless you want it to manage broader workspace state and artifacts. Run it only in a project workspace you are willing to modify, and review changes to outline/, output/, UNITS.csv, STATUS.md, and DECISIONS.md.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (17)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: log_path = workspace / log_rel try: completed = subprocess.run(cmd, check=False, capture_output=True, text=True) if completed.stdout or completed.stderr or completed.returncode != 0: ensure_dir(log_path.parent) body = [
Confidence: 95% confidence
Finding: completed = subprocess.run(cmd, check=False, capture_output=True, text=True)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 88% confidence
Finding: The skill advertises no declared permissions, yet its documented behavior includes reading workspace files, writing `outline/tables_appendix.md`, emitting `output/TABLES_APPENDIX_REPORT.md`, and invoking `python scripts/run.py`, which implies shell/code execution capability. This creates a trust and policy gap: reviewers or runners may treat the skill as low-privilege when it actually modifies files and executes code, increasing the chance of unintended file changes or abuse if the script or workspace is compromised.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 93% confidence
Finding: The public description frames the skill as simple appendix-table curation with no invented facts and no network use, but the actual behavior also generates an audit artifact, runs validation/quality-gate logic, and depends on broader repository tooling. That mismatch can mislead operators about side effects, failure modes, and trust boundaries, which is dangerous because hidden validation code or auxiliary outputs can expose data, break pipelines, or perform actions beyond what the user consented to.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The file content is a full graduate-thesis reconstruction pipeline, which materially differs from the declared skill purpose of writing appendix survey tables. This kind of scope/identity mismatch is dangerous because an orchestrator or user may invoke the skill expecting a narrow, low-risk formatting task but instead expose broad editing, restructuring, citation, and build workflows across the whole thesis repository.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The documented capabilities include workspace initialization, chapter reconstruction, TeX writeback, compilation review, frontmatter sync, and citation enhancement—far beyond what an appendix-table writer should need. Excess capability increases blast radius: if routed incorrectly, the skill could rewrite major thesis content, alter references, or trigger broad project changes under the guise of a small appendix formatting task.

Intent-Code Divergence

High

Confidence: 99% confidence
Finding: The file explicitly states it is not a survey pipeline and instead defines a graduate-paper pipeline, directly contradicting the manifest's survey-table framing. This contradiction is a strong indicator of configuration drift or mislabeled functionality, which undermines trust boundaries and can cause unsafe delegation of repository-wide authority to a supposedly narrow skill.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: This file is a full research-idea brainstorming pipeline, not an appendix-table writer. That mismatch is security-relevant because routing or invoking the skill under the appendix-table-writer identity could trigger materially broader behavior than the user authorized, including retrieval, ideation, screening, and memo generation. In agent systems, this kind of scope drift can bypass least-privilege expectations and cause unintended data access or unsafe autonomous actions.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The pipeline includes literature retrieval and broad ideation stages even though the advertised skill should only transform already-available C4 artifacts into appendix tables. This expanded capability increases attack surface and can lead to unauthorized network-like research actions, scope expansion, or fabrication pressure beyond the user's requested task. In a constrained skill context, unjustified capability broadening is dangerous because users and orchestrators may trust the narrower contract.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The inline text explicitly says the pipeline is not for writing a survey draft, which directly conflicts with the surrounding skill identity focused on publishable appendix survey tables. This contradiction is dangerous because it signals the artifact may be misplaced or mislabeled, making policy enforcement and operator review unreliable. While less severe than active broad capabilities, it reinforces that the skill cannot be trusted to perform only its advertised function.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: This file is a generic workflow executor that edits unit state, approvals, logs, checkpoints, reroute state, and downstream invalidation rather than a narrow appendix-table formatter. That scope mismatch is dangerous because a supposedly low-risk content skill is actually capable of driving arbitrary pipeline execution and mutating control-plane artifacts across the workspace.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: Using subprocess execution inside a skill advertised as 'Network: none' and intended for reader-facing appendix tables is unjustified capability expansion. In this context, any execution primitive is more dangerous because users would reasonably assume a formatting skill cannot trigger arbitrary code paths or external workflow runners.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The code can automatically mark HUMAN checkpoints approved when checkpoint names appear in auto_approve, bypassing the intended human gate. That is a real security issue because approval state is a control mechanism, and bypassing it undermines separation of duties and lets automated runs promote workflow units without actual review.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The skill mutates UNITS.csv, STATUS.md, DECISIONS.md, and checkpoint state, which are workflow-control artifacts unrelated to generating appendix tables. In the given skill context this increases risk because the implementation has broad state-changing authority that could be abused to alter execution history or force pipeline progression.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This file implements a full ideation and research-brainstorm pipeline that produces memos, rankings, shortlists, and discussion artifacts rather than the appendix survey tables promised by the skill manifest. In an agent setting, this is dangerous because the skill can silently perform materially different work than requested, causing scope hijack, incorrect outputs, and downstream misuse of generated artifacts under false assumptions about purpose and guardrails.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The generated report text explicitly states that the output is 'not a survey draft,' which directly contradicts a skill advertised for producing reader-facing appendix survey tables. This contradiction is a strong indicator of capability/intent mismatch and increases the risk that the agent will substitute the wrong artifact type while operators continue trusting the manifest and its table-specific guardrails.

Vague Triggers

Medium

Confidence: 92% confidence
Finding: The pipeline is marked as `routing_default: true` with broad routing hints such as `review`, `survey`, and `literature review`, which can cause it to be selected for generic requests outside the narrow arXiv-survey use case. Because this pipeline orchestrates many downstream skills and produces large numbers of artifacts, misrouting can trigger unintended processing, tool use, or workflow execution on requests that did not warrant this high-privilege survey pipeline.

Missing User Warnings

High

Confidence: 95% confidence
Finding: The `pre_retrieval_shell` block enables shell execution while `approval_surface: false` suppresses visible approval or warning at the pipeline layer. In a pipeline that coordinates many skills and artifacts, hidden shell capability increases the risk of unreviewed command execution, especially if the pipeline is misrouted or invoked on adversarial inputs.

VirusTotal

57/57 vendors flagged this skill as clean.

View on VirusTotal