qa-req2testcase-generator

Security checks across malware telemetry and agentic risk

Overview

This skill can generate QA test cases, but it also has automatic cloud push, credential persistence, broad file discovery, and package-install behavior that users should review before installing.

Install only if you are comfortable with this skill searching local folders for requirement files, installing Python packages, storing API credentials, and sending requirement-derived summaries or test cases to the configured remote HTTP review service. Prefer disabling cloud push/sync, removing the hardcoded API key, requiring explicit file paths, and using a controlled environment before use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (64)

os.system() or os exec-family call

High

Category: Dangerous Code Execution
Content: from lxml import etree except ImportError: print("❌ 依赖缺失，正在安装 python-docx lxml ...") os.system(f"{sys.executable} -m pip install python-docx lxml -q") from docx import Document from docx.opc.constants import RELATIONSHIP_TYPE as RT from docx.oxml.ns import qn
Confidence: 97% confidence
Finding: os.system(f"{sys.executable} -m pip install python-docx lxml -q")

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill documentation indicates effective capabilities including shell, file read/write, environment access, networking, and orchestration, but it does not declare permissions or present corresponding user-facing constraints. This creates a transparency and control gap: a caller may invoke a seemingly simple testcase generator that can also access local resources and transmit data externally.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The declared purpose is requirement-to-testcase generation, but the documented behavior extends into OCR/image understanding, cloud sync, review pushing, report export, state management, external HTTP-facing services, and other orchestration features. This mismatch is dangerous because users and upstream policy may grant trust based on the narrow description while the skill actually handles broader data flows and external side effects.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The changelog explicitly documents broad filesystem searching across user directories and copying discovered documents into the task directory. For a testcase-generation skill, this is excessive data access and can expose unrelated sensitive files if discovery heuristics misfire or are abused. The mismatch between declared purpose and host-level file discovery materially increases the risk.

Context-Inappropriate Capability

High

Confidence: 93% confidence
Finding: Automatic package-manager-based OCR installation introduces software modification and command execution capabilities unrelated to normal testcase generation. This can change the host environment unexpectedly, expand attack surface, and enable abuse of package managers in contexts where the skill should be read/process-only.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The changelog shows undisclosed external integrations for cloud review push, cloud knowledge sync, and image-analysis APIs. Network egress and remote data transfer are sensitive capabilities, especially when the skill handles requirement documents and generated test assets that may contain proprietary business information.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The configuration enables outbound connectivity to a remote review/sync service and includes an embedded API key, even though the skill is described as a requirement-to-testcase generator rather than a networked review client. This creates an unnecessary external data flow that could exfiltrate user requirements, generated test cases, or related project data to a third-party host over insecure HTTP, substantially increasing confidentiality and integrity risk.

Intent-Code Divergence

Low

Confidence: 86% confidence
Finding: The comment says experience sync is only a reserved future feature, but the configuration already sets it to enabled, which is misleading and can conceal active network functionality from reviewers and operators. This mismatch increases the chance that sensitive content is transmitted unexpectedly because maintainers may believe the feature is dormant when it is not.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The prompt defines conflicting control semantics for blocking issues: one part says any blocking issue must yield recommended_action = BLOCKED, while the later flow says execution must continue automatically and blockers are converted into PCI items. This ambiguity can cause downstream agents or orchestration logic to make inconsistent decisions, leading to tests being generated despite unresolved blockers or, conversely, workflows stopping unexpectedly.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The statement 'as long as there are blocking issues, recommended_action is BLOCKED' directly conflicts with the later instruction to auto-skip blockers and continue non-interactively. In an agentic pipeline, contradictory instructions are dangerous because they create non-deterministic behavior, weaken guardrails, and can silently bypass quality gates that operators expect to be enforced.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The prompt contains conflicting state requirements for low-quality inputs: one section requires `CONDITIONAL_PASS` when score < 0.7, while another output/status definition elsewhere appears narrower. Contradictory instructions in a control prompt are dangerous because downstream agents may choose different branches unpredictably, causing gate bypasses, malformed output, or silent loss of quality signals relied on by later stages.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The file forbids fabrication, yet the example includes a business rule explicitly marked as derived by inference (`推断`). In an AI skill that structures requirements for later test generation, this inconsistency can normalize invented facts, leading downstream stages to treat speculation as authoritative requirements and generate incorrect or unsafe test assets.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The onboarding flow instructs the agent to install packages and mutate the local environment (`pip install openpyxl`) as part of normal execution. That exceeds the narrow scope of converting requirements into test cases and creates unnecessary supply-chain and integrity risk, especially if package sources, permissions, or environments are not tightly controlled.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The skill performs broad shell-based discovery across multiple directories (`find ~/.openclaw ~/.local ~/skills /app/skills`) to locate files. This increases exposure to unrelated filesystem contents and normalizes wide environment probing that is not strictly required for generating test cases from a requirement payload.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The onboarding logic reads and may create or modify persistent user preference files (`user_knowledge/preferences.json`) before performing the core task. Persistently altering user state without clear necessity or separate consent expands the blast radius of the skill and can cause unwanted cross-session behavior changes.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill directs ingestion of uploaded documents into a long-lived project knowledge base, which goes beyond one-shot requirement analysis. This creates data retention and confidentiality risk because sensitive project materials may be stored and reused across future tasks without strong boundaries or explicit lifecycle controls.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The document claims direct manual `exec` instructions should not be followed, yet it repeatedly embeds direct `exec` commands throughout the file. This contradiction is a red flag because it can mislead reviewers and operators about the skill's true behavior while still attempting to drive command execution, increasing the chance of unsafe execution of adversarial instructions.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The prose says Step 4 must not continue when any of P2/P3/P4 inputs are missing, but the actual gate-check treats P3 and P4 as optional warnings. This creates a policy/implementation mismatch that can silently produce incomplete or degraded P5 output while operators believe all prerequisites were enforced, undermining workflow integrity and downstream test quality.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: Step 7 states that Excel export must only be performed via the approved script, but later permits fallback behaviors that disclose equivalent testcase content through other channels, including inline Markdown. This creates a policy contradiction that weakens output controls and can bypass the intended guardrails around structured export.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The learning flow accepts user-uploaded modified Excel files, derives lessons from them, and stores the results in a persistent knowledge base for future runs. This extends the skill from one-shot generation into cross-session data retention and reuse, creating privacy, consent, and data-governance risk beyond the declared purpose.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The onboarding flow adds an online review-platform push feature that is not necessary for a requirements-to-testcase generator and expands the skill's data-handling scope. This creates unnecessary data exfiltration risk because generated test cases and possibly requirement-derived content may be transmitted to an external platform without being core to the declared function.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill requests a shared password to enable external image-understanding API access, which broadens the capability beyond straightforward testcase generation and introduces credential and data-transfer risk. Because requirement documents may contain sensitive business information, sending derived or raw image content to an external service can expose confidential material outside the user's expected processing boundary.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The skill instructs the agent to search broad local filesystem locations, including common user folders and the home directory, to locate documents based only on recency and file extension. This exceeds what is necessary for a requirement-to-testcase workflow and can cause unauthorized discovery and processing of unrelated sensitive files, especially because the agent then copies the first match into its workspace.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill requests a user-provided password/API key and passes it directly to an orchestrator command for image understanding, despite the skill being described as requirement-to-testcase generation rather than credential-mediated external access. This creates unnecessary credential handling risk, including exposure through logs, process arguments, shell history, or downstream components that the user has not meaningfully vetted.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The rule file expands the skill from local requirement-to-testcase generation into cloud review pushing and references cloud knowledge synchronization, which materially broadens data handling beyond the stated purpose. This creates an implicit data exfiltration and scope-creep risk because requirement artifacts and generated test cases may be transmitted to remote services without clear user consent, minimization, or boundary controls.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal