视觉摘要智述技能

Security checks across malware telemetry and agentic risk

Overview

This visual-summary skill is not proven malicious, but it needs review because it sends media and user identifiers to cloud services, can create/login accounts and store tokens locally, and exposes historical report data.

Install only if you are comfortable sending images or videos and a user identifier to the provider's cloud service. Avoid using a phone number as the open-id if an opaque account ID is available, and ask the publisher to document or remove silent account creation/login, local token storage, historical report export links, and health-style analysis behavior before using it with sensitive media.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (22)

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The skill advertises visual summarization but also auto-triggers cloud history-report queries, which is a separate data-access function. This broadens the data surface from analyzing current user-provided media to retrieving prior account-linked outputs, potentially exposing past reports unexpectedly.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill requires obtaining an open-id or API key from local configuration or user input, even though its stated purpose is media description. Pulling identifiers from config and tying them to report storage/query operations introduces account linkage and possible secret handling risks that are not justified or transparently disclosed by the manifest.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The documented API response describes face detection, constitution analysis, organ-condition assessment, and health warnings, which are materially different from the declared skill purpose of visual scene summarization. This mismatch creates a strong risk of undisclosed sensitive biometric/health inference and deceptive data handling, especially when users expect only neutral scene description from uploaded videos or images.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The response schema explicitly indicates medical/health-style outputs such as diagnosis, organ condition, complexion interpretation, and health suggestions rather than simple scene description. In the context of a visual-summary skill, this indicates scope drift into sensitive inference from visual data, which can expose users to privacy harms, unconsented profiling, and potentially unsafe pseudo-medical conclusions.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill is described as generating visual scene descriptions, but this API wrapper also exposes generic record-management operations such as add, edit, list, page, and delete. This unnecessary expansion of capability increases attack surface and can enable unauthorized modification or removal of backend analysis records if the skill or its callers are misused.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The delete(cameraSn) method provides direct deletion capability even though the skill's stated purpose is analysis of images/video into descriptions. In this context, destructive functionality is unjustified and particularly risky because a compromised or misused skill could delete records by identifier, causing integrity loss and possible service disruption.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill exposes report-listing and export-link retrieval capabilities that go beyond the declared purpose of generating a scene description from provided media. This expands the skill's data-access surface and may let a caller enumerate prior analysis records or obtain report URLs unrelated to the current user request, increasing the risk of unauthorized data exposure if upstream access control is weak or absent.

Description-Behavior Mismatch

Low

Confidence: 86% confidence
Finding: The code accepts arbitrary http/https URLs and forwards them as analysis input, which can turn the backend service into a fetcher for attacker-chosen remote content. Depending on how the downstream analysis service handles URLs, this can enable SSRF-style access to internal resources, unexpected network egress, or analysis of content the user never directly supplied.

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: This file implements a broad generic API client with create, edit, delete, pagination, and arbitrary HTTP verb wrappers, which materially exceeds the declared purpose of a visual scene-description skill. In an agent-skill context, this expands the skill from narrow media analysis into a reusable remote-action surface that could be invoked for unrelated data access or state-changing operations, increasing abuse potential and reducing auditability.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The code behavior does not match the manifest description: instead of a specialized visual-analysis client, it exposes a general-purpose remote resource management interface. This mismatch is dangerous because reviewers and users may trust the skill as low-risk media processing while the implementation can perform broader remote operations, enabling hidden capability creep.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: This skill is described as visual scene analysis, yet the module creates a persistent local SQLite database and provides generic CRUD operations unrelated to that purpose. Such hidden persistence expands the attack surface, enables unnecessary retention of user data, and makes it easier for the skill to collect or correlate information outside its declared function.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The User model stores account-style data including username, email, token, and open_token, which is not justified by a visual summarization skill. Persisting tokens materially increases the risk of credential theft, account takeover, and cross-context tracking if the local database is accessed by other components or exposed through backups/logs.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This utility code performs hidden account bootstrap, login, token retrieval, and local token persistence in a skill described as visual content analysis. That is a significant scope mismatch: it collects and transmits user identifiers, creates accounts automatically, and stores authentication material without clear user consent or necessity for the stated function, increasing privacy, account-abuse, and unauthorized service-use risk.

Context-Inappropriate Capability

Medium

Confidence: 86% confidence
Finding: The code injects a payment/recharge workflow into HTTP error handling, which is unrelated to visual analysis and indicates undisclosed monetization or account-balance dependency. This is risky because it can manipulate user behavior, conceal true backend failures, and confirms the skill is wired into broader account/payment systems beyond its stated purpose.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: A broad default trigger for common image/video description requests can invoke the skill without sufficiently specific user intent. In this context, unintended activation matters because the skill may save files locally and send content to a cloud API, causing privacy-impacting actions on ordinary requests.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The auto-trigger keywords for historical report queries are overly broad, so routine phrases about 'history' or 'reports' may cause retrieval of account-linked cloud records without a clear request. Because prior reports may contain sensitive generated outputs and links, accidental invocation can expose more data than intended.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill description does not clearly warn users that uploaded media and identifiers are transmitted to a cloud API and that returned report links may expose generated outputs. Missing disclosure undermines informed consent and can lead users to share sensitive video/image data without understanding retention, access, or link-sharing implications.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: The API accepts direct video uploads and publicly accessible video URLs but provides no warning about privacy exposure, retention, third-party access, or the sensitivity of visual content. This is more dangerous here because the same document suggests face and health-style analysis, meaning users may unknowingly submit highly sensitive biometric data to an external service.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The skill reads local file contents and uploads them, or submits a remote media URL, to an external analysis API without any in-file disclosure, consent flow, or visible warning to the caller. This creates a privacy and data-handling risk because users may reasonably assume analysis is local or may not realize sensitive media is being transmitted to a third party.

Missing User Warnings

Medium

Confidence: 83% confidence
Finding: The delete method is a generic remote state-changing wrapper that can issue deletion requests to arbitrary supplied URLs without any visible safety checks, confirmation, or scope restriction in this module. In the context of a skill advertised for visual summarization, hidden delete capability is particularly concerning because it enables destructive external actions unrelated to the stated purpose.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The module creates and alters a local database file automatically, with no disclosure in the skill's stated purpose. Undisclosed file writes and schema changes are risky in an agent skill because they introduce persistent side effects, can surprise operators, and may store sensitive data beyond the user's expectation.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The code sends HTTP requests with user-linked identifiers and authentication headers, and may create/recover tokens, without any non-debug disclosure, consent prompt, or normal-user warning. In a visual-summary skill, silent credential and identity handling is more dangerous because users would reasonably expect media analysis, not hidden account and auth operations.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal