'AI自动化测试平台'

Security checks across malware telemetry and agentic risk

Overview

The skill matches an internal AI testing platform, but it gives broad high-impact authority with weak scoping and unclear data-boundary protections.

Review before installing or deploying. Use only in a tightly isolated internal environment, replace all default credentials and secrets, require a separate admin role for auth/system/backup functions, disable query-string tokens, add owner checks on task/script/report/artifact access, redact data before DeepSeek or use a local model, and run generated Python/Playwright tests only in disposable sandboxes with minimal network and filesystem access.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (50)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 95% confidence
Finding: The skill describes capabilities to read/write files, access the network, and execute shell-driven workflows, yet it does not declare permissions. That creates a transparency and governance gap: operators may enable or trust the skill without understanding its effective access, which is especially risky because it can generate and run test scripts, invoke external APIs, and manipulate persistent data.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 82% confidence
Finding: The documented scope understates the operational and administrative behaviors exposed by the skill, including broader model management, backup operations, scheduling/executor functions, and direct auth-code administration. This mismatch can cause reviewers and deployers to underestimate the attack surface and inadvertently grant a skill far more authority than its description suggests.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The guide states that all data must remain internal and that no external uploads are allowed, but it also relies on the external DeepSeek API. This contradiction can lead users to send internal documents, logs, or test data off-premises under a false assumption of data locality, creating confidentiality and compliance risk.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The deployment guide adds API gateway Lua code that performs authorization by directly querying the database, which expands the trusted computing base and introduces a second, inconsistent auth implementation outside the application. In a testing platform, embedding ad hoc auth logic in the gateway is dangerous because the sample concatenates the Authorization header into SQL, creating a likely SQL injection path at the gateway layer.

Intent-Code Divergence

High

Confidence: 95% confidence
Finding: The document claims the database/gateway approach is effectively unbypassable, but the provided gateway sample only checks is_active and omits expiry, permission scope, and usage-count enforcement. This mismatch creates a false sense of security and can allow unauthorized use when operators rely on the incomplete example as a compensating control.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The admin API exposes authorization-code creation and verification with no visible authentication or authorization checks before performing privileged actions. Because the endpoint can mint usable auth codes for broad permissions such as 'all', an unauthenticated caller could grant themselves platform access and bypass the intended authorization model.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The progress endpoint returns task status by task_id with no authentication or authorization check, allowing anyone who can guess or obtain a task ID to view execution state and potentially sensitive test metadata. In an internal AI-powered testing platform, task progress and results may reveal scripts, environments, endpoints, or operational details, so the lack of access control is a real information disclosure issue.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The record detail endpoint exposes execution record data solely by record_id and performs no authentication or authorization checks. This creates an insecure direct object reference pattern where unauthorized users can retrieve potentially sensitive test execution details, including internal system behavior, test data, or environment-specific information.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The task progress endpoint returns task status and result data without any authorization check, while generation endpoints tie created content to an auth_code. Because completed task results may include generated test case content, script content, and file paths, an attacker who can guess or obtain a task_id can access another user's sensitive generated artifacts. In an internal AI testing platform handling company documents and automation scripts, this becomes more dangerous because task results may expose proprietary specs, test logic, or internal paths.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The module-level description explicitly mentions authorization management for the platform, but the progress endpoint is implemented without any authorization enforcement. This inconsistency is security-relevant because it creates a false expectation that all related endpoints are protected, increasing the chance that sensitive generation results are exposed unnoticed. In this skill context, authorization gaps are especially risky because the service processes internal company testing materials.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: Several script management endpoints (`get_script`, `update_script`, `delete_script`) do not require or verify any authorization, while adjacent routes rely only on a caller-supplied auth code. This enables unauthorized users to read, modify, or delete test scripts by ID, creating a direct access-control failure in a platform that claims authorization management for internal use.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: Browser configuration and artifact retrieval routes (`configure_browser`, `get_browser_config`, `get_screenshot`, `get_trace_file`) expose sensitive operations and files without authorization checks. An attacker could alter execution settings or download screenshots/traces that may contain internal URLs, credentials, session data, or other sensitive test artifacts.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The middleware explicitly whitelists `/admin/add_auth` and `/admin/create_auth`, allowing unauthenticated access to endpoints that appear to manage authorization credentials. In an internal AI test platform, exposing auth-management routes publicly can let an attacker mint or alter authorization codes, leading to broad privilege escalation across test generation and execution capabilities.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The comment and implementation both indicate that listed paths bypass authorization, and that list includes admin endpoints for authorization management. This is not merely a documentation issue; it reflects a real insecure design that lowers protection around the most sensitive control-plane functions in the application.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: This service takes stored script content, writes it to a temporary .py file, and executes it with pytest. Because pytest imports and runs Python code, any user able to create or modify scripts can achieve arbitrary code execution on the server, potentially leading to data theft, lateral movement, environment-secret access, or full host compromise. In an internal AI-powered testing platform, this is especially dangerous because the feature appears normalized as routine test execution.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The service returns execution record details including raw logs and, in the detail view, the stored authorization code, with no visible access-control checks or data minimization in this code path. In an internal AI testing platform, logs commonly contain request payloads, tokens, endpoints, and failure traces, so exposing them broadly can leak sensitive operational data and credentials.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The service sends raw execution logs to an external AI provider for failure analysis. Test logs commonly contain sensitive internal data such as stack traces, endpoints, tokens, usernames, file paths, or proprietary code context, so this creates a real confidentiality and data-boundary risk if users or administrators are not explicitly consenting to external sharing. In this internal company testing platform context, the risk is higher because the logs are likely to reflect non-public systems and internal test assets.

Context-Inappropriate Capability

High

Confidence: 92% confidence
Finding: The service exposes database/file backup and restore primitives with no visible authorization checks, scope restrictions, path validation, or safety controls in this layer. In an AI testing platform, these are highly sensitive administrative operations; if reachable by an unintended caller, they could enable destructive restore actions, deletion of backups, or broader compromise of application data and files.

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: The document exposes broad administrative capabilities—AI model administration, environment management, audit logs, and backup/restore—that exceed the stated scope of an automated testing skill. Scope expansion like this increases attack surface and can enable misuse of privileged system-management functions if the skill is deployed or trusted based only on its manifest description.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: Backup and restore are powerful administrative operations unrelated to ordinary automated test execution and can affect the entire platform state. In this context, undocumented or weakly justified restore capability is dangerous because it could be abused to overwrite data, roll back security state, or access sensitive backups under the guise of a testing tool.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: Capturing screenshots for every UI test can collect credentials, personal data, internal dashboards, tokens, and other sensitive visual content. Because the skill stores screenshots persistently, this expands the volume of sensitive artifacts and raises the risk of unauthorized access, leakage, or over-retention.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The failure-analysis prompt sends execution logs to an external AI service without warning or controls around sensitive content. Test logs often contain request headers, tokens, stack traces, PII, endpoints, and internal system details, so transmitting them externally can expose secrets and proprietary information.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The deployment instructions hardcode a default MySQL root password of root123 and instruct users to set the same value during installation. This creates a predictable administrative credential that is highly likely to be reused in real deployments and can lead to immediate full database compromise.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The README documents AI-based test generation and automated execution, but it does not clearly warn that uploaded documents or prompts may be sent to an external AI provider, nor that generated API/UI scripts can perform real actions against target systems. In an internal testing platform, this can lead users to expose sensitive company data to third-party services or execute unsafe generated tests without adequate review.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The helper accepts authorization credentials from the query string as well as the Authorization header. Query parameters are commonly logged by servers, proxies, browsers, monitoring tools, and referrer headers, increasing the chance of token leakage and replay by unauthorized parties.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal