大模型视觉问答技能

Security checks across malware telemetry and agentic risk

Overview

This skill is advertised as image question answering, but its files also include broader account, history, video/health-analysis, credential persistence, and backend mutation capabilities that need review before use.

Install only if you are comfortable with a broader service integration than image Q&A. Use only media and identifiers you are authorized to submit, and review the remote service, local token database, history/report access, and dependency issue before deploying it in a shared or sensitive workspace.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (28)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill invokes shell commands, reads configuration files, writes local files, accesses environment paths, and calls remote APIs, yet it declares no permissions. This under-specification prevents proper policy enforcement and informed user consent, and is especially risky here because the workflow includes local file handling and network access tied to user identifiers and history retrieval.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: The documented purpose is image Q&A, but the behavior extends into account/login flows, token or profile persistence, and retrieval of prior records and report links. This mismatch can mislead reviewers and users about the true data-handling scope, increasing the chance that sensitive identity and historical analysis data are collected or exposed without meaningful consent.

Intent-Code Divergence

Medium

Confidence: 87% confidence
Finding: The document forbids assuming or generating open-id values, but the examples still show a concrete placeholder value that operators may copy into real use. In this skill, open-id is tied to saving and querying history, so encouraging reuse of a shared identifier can lead to cross-user data mixing, unauthorized history access, or accidental attribution of records to the wrong account.

Intent-Code Divergence

High

Confidence: 95% confidence
Finding: The file documents pet health analysis and report export APIs, which materially conflicts with the declared purpose of a visual Q&A skill. This kind of scope mismatch is dangerous because it can hide undisclosed data flows, unexpected backend capabilities, or accidental cross-domain access to sensitive health/reporting functions under an unrelated skill identity.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The referenced endpoints include health analysis results, historical report pagination, and full report export rather than visual question answering. Exposing or bundling report retrieval/export capabilities in a mismatched skill increases the risk of unauthorized access to sensitive animal health data and makes operator/user trust decisions based on false functionality claims.

Description-Behavior Mismatch

High

Confidence: 94% confidence
Finding: The skill is described as performing visual question answering, but this service also exposes generic record-management operations including listing, creation, editing, and deletion. That mismatch creates an unnecessary and risky expansion of capability: an agent or caller expecting only analysis could potentially enumerate or mutate backend data through undocumented administrative-style endpoints.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The delete method allows destructive backend mutation via a cameraSn identifier, which is not justified by the declared image Q&A purpose. In an agent setting, this kind of hidden destructive capability can be abused to remove records or device-related data if the tool is invoked improperly or exposed to untrusted prompts/workflows.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: The add and edit methods enable backend state changes unrelated to open-ended image analysis, increasing the attack surface beyond what users and orchestrators would expect from a visual QA skill. Such undocumented mutation capabilities can be misused to tamper with records, poison data, or perform unauthorized administrative actions through the agent interface.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The paginated page/list operations expose backend record enumeration capabilities that are not clearly necessary for answering questions about images. Even without direct mutation, unauthorized listing can disclose metadata, system contents, or operational history that broadens reconnaissance opportunities for an attacker.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The skill includes a history-listing capability tied to an open_id even though its stated purpose is image Q&A. This expands the data surface from single-request analysis to retrieval of prior user activity, which can expose past prompts/results if authorization and privacy controls are weak or if callers can supply another user's identifier.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The reference API documentation materially conflicts with the skill's declared purpose of image-based visual Q&A by instead describing video upload/URL ingestion and returning face-analysis and health/diagnostic outputs. This kind of scope mismatch is dangerous because it can conceal undeclared collection and processing of biometric and health-related data, causing users and integrators to grant permissions or send data under false assumptions.

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The documented response includes constitution assessment, organ-condition judgments, color analysis, and health warnings/suggestions, which are medical or quasi-medical outputs unrelated to ordinary VQA. Presenting a general visual-analysis skill as one capable of diagnosis creates risk of unsafe medical reliance, regulatory exposure, and covert inference of sensitive health attributes from uploaded media.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The implementation materially diverges from the manifest: the skill is described as open-ended image VQA, but the code handles video URLs, local video files, and report retrieval/export flows. This kind of capability mismatch is dangerous because users and policy layers may grant access expecting low-risk image Q&A, while the skill actually performs remote media submission and broader analysis operations.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The code accepts arbitrary local file paths and uploads file contents to a remote analysis service, which is inconsistent with a simple image question-answering skill. This creates a real risk of unintended data exfiltration of local media because a caller may believe they are invoking a benign VQA function rather than transmitting local files off-host.

Description-Behavior Mismatch

High

Confidence: 94% confidence
Finding: The implementation performs video analysis and exposes video-history functionality, which materially differs from the manifest's stated purpose of image-based visual Q&A. In an agent ecosystem, this scope mismatch is dangerous because it can mislead users and policy layers into granting access to capabilities and data flows they did not intend to authorize, including processing of richer media and access to prior analysis records.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The history-listing capability is unrelated to open-ended image Q&A and may expose prior user activity or analysis metadata through a broadly named --open-id parameter. Even though this code path does not itself show authorization logic, exposing account/history retrieval in a mismatched skill increases the risk of privacy violations and unauthorized data access if downstream checks are weak or absent.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: This file exposes broad generic CRUD and raw HTTP helper methods that are not scoped to the stated visual Q&A purpose of the skill. In an agent-skill context, such capabilities can be reused to contact arbitrary endpoints or perform unintended state-changing operations, expanding the attack surface well beyond image question-answering.

Description-Behavior Mismatch

Medium

Confidence: 82% confidence
Finding: This skill is described as visual Q&A, but the file implements a generic persistence layer for user-account data and automatic schema mutation. That mismatch materially increases suspicion because the code can create and modify local databases and user records outside the declared image-analysis purpose, expanding the skill's capability and data-handling surface without clear justification. In the context of an agent skill, undeclared persistence can enable covert collection or retention of user metadata.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: The model stores usernames, emails, birthdays, and especially token/open_token values even though the skill is advertised only as visual Q&A. Persisting authentication-like secrets and personal identifiers creates unnecessary sensitive-data exposure; if the local SQLite database is accessed by another process or copied from disk, those credentials and user records may be compromised. Because this data handling is not obviously required for image Q&A, the context makes it more dangerous and harder for users to anticipate.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The request utility does substantially more than generic HTTP transport for an image Q&A skill: it auto-provisions accounts via /sys/phoneLogin, retrieves tokens, and persists them through a DAO. That behavior expands the skill's privilege and data-handling scope beyond the stated purpose, creating undisclosed identity, account, and credential-management risk if triggered implicitly during normal use.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: Broad automatic trigger phrases for history lookup can cause the skill to query prior records when the user did not clearly request that action. Because history retrieval involves remote access to stored past Q&A tied to an open-id, unintended invocation can disclose sensitive prior analysis metadata or cause unnecessary data processing.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The tool requires an open_id/user identifier and uses it to fetch user-specific data, but provides no meaningful privacy notice, consent flow, or minimization. In a CLI or agent context, this can lead to unnecessary collection and transmission of personal identifiers and enables misuse if operators submit someone else's identifier.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The API accepts video files and public video URLs and the sample output includes face detection and health-related inference, yet the documentation provides no privacy notice, consent guidance, retention policy, or warning about biometric/sensitive data processing. That omission increases the chance that users will upload identifiable or health-linked media without understanding the risks, leading to privacy harm and compliance violations.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill reads local file contents and sends them to a remote analysis API without any user-facing disclosure in this code path. Even if the upload is intended, the lack of transparency increases the chance of users unknowingly exposing sensitive media or metadata to an external service.

Missing User Warnings

Medium

Confidence: 78% confidence
Finding: These CRUD wrappers enable outbound POST-based add, edit, and delete operations without any visible restriction, confirmation, or purpose limitation. In a skill whose declared function is visual Q&A, hidden state-changing network actions are higher risk because they could be repurposed to modify remote resources unrelated to the user’s intent.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal