视频搜索检索智能分析技能

Security checks across malware telemetry and agentic risk

Overview

This skill needs Review because it is presented as video search but includes sensitive face/health analysis, account/token handling, and broad backend access.

Install only after reviewing the face-analysis and shared smyx_common files, and do not use it with real personal, employee, medical, or biometric video unless you understand what remote service receives the data, how tokens and reports are stored, and who can list, export, or delete records.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (28)

Dynamic attribute access via getattr()

Low

Category: Dangerous Code Execution
Content: if filters: for key, value in filters.items(): query = query.filter(getattr(self.__model__, key) == value) if offset: query = query.offset(offset)
Confidence: 77% confidence
Finding: query = query.filter(getattr(self.__model__, key) == value)

Dynamic attribute access via getattr()

Low

Category: Dangerous Code Execution
Content: if filters: for key, value in filters.items(): query = query.filter(getattr(self.__model__, key) == value) return query.scalar() finally:
Confidence: 74% confidence
Finding: query = query.filter(getattr(self.__model__, key) == value)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 97% confidence
Finding: The skill declares no permissions while its documented behavior requires environment access, local file read/write, network access, and shell execution. This misleads any permission or trust model around the skill and prevents users or the platform from making an informed decision before allowing sensitive operations such as reading config files, saving attachments, and invoking Python scripts that contact external APIs.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 98% confidence
Finding: The documented purpose is video search, but the detected behavior includes automatic account login/creation, local token persistence, listing historical records, and use of unrelated generic analysis infrastructure. That mismatch is dangerous because it can cause users to authorize a seemingly narrow media-analysis skill that actually handles identity, credentials, and persistent data outside the stated scope, increasing the chance of covert data collection or misuse.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The README describes a Traditional Chinese Medicine face-analysis and health-diagnosis tool, while the skill metadata says the skill is for video search and retrieval. This kind of cross-skill mismatch is a strong supply-chain/integrity red flag because users and orchestrators may invoke the skill under false assumptions, potentially exposing sensitive face videos and medical-like outputs to an unrelated or hidden workflow.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The documented API performs face analysis and returns medical/health-style diagnosis data, which materially conflicts with the declared purpose of a video search/retrieval skill. This mismatch is dangerous because it can conceal undeclared processing of biometric and health-related inferences, causing unauthorized collection, transmission, or secondary use of highly sensitive data.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The response schema includes 'diagnosis', organ-condition assessments, constitution labels, and health warnings/suggestions, which are unjustified for a video-search skill and indicate sensitive health inference from facial/video data. Such capability creates significant privacy, compliance, and misuse risk because users may submit videos for search while unknowingly receiving or enabling medical-style profiling.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: This API service exposes add, edit, and delete record-management operations even though the declared skill purpose is video search and analysis. That scope mismatch increases risk because a caller using this skill for retrieval could also modify or remove backend records if upstream authorization is weak or the capability is invoked unintentionally.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: The delete method enables destructive backend actions that are not justified by the stated video search/retrieval purpose. In an agent setting, hidden deletion capability is especially dangerous because it can be triggered through indirect tool use, causing data loss or tampering beyond user expectations.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The implementation behavior does not match the declared skill purpose: instead of video search/retrieval, it invokes face-analysis logic and returns diagnostic-style results. In a skill ecosystem, this mismatch is dangerous because users, reviewers, and policy gates may authorize a low-risk video-search capability while the code actually processes biometric and health-related data, enabling covert collection or misuse under false pretenses.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The CLI advertises a face-diagnosis tool even though the metadata says the skill is for video search. This contradiction strongly indicates capability smuggling: a user may install or approve one class of skill while receiving another that handles far more sensitive data and may bypass expected governance, consent, and review.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The code formats and exposes diagnostic/health-analysis outputs such as constitution, organ condition, warnings, and suggestions, which are unrelated to a video-search skill. This materially increases risk because it processes highly sensitive biometric and inferred health data without justification from the declared context, creating privacy, compliance, and misuse concerns.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The implementation materially conflicts with the declared skill purpose: instead of video search/retrieval, it performs face/health analysis, generates health-report text, and exports report images. In an agent ecosystem, this kind of capability mismatch is dangerous because users or orchestrators may route video-search inputs into a health-analysis pipeline, causing unexpected collection and processing of sensitive biometric/health data under false pretenses.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The validation/error message tells users they must provide a local or network video, while the surrounding code and outputs are for face-diagnosis analysis reports. This misleading interface increases the chance that users submit content believing they are invoking video retrieval, when in fact they may be initiating health/biometric analysis and downstream export behavior.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: This service exposes broad generic HTTP and CRUD wrappers (`add`, `edit`, `delete`, `http_get`, `http_post`, `http_put`, `http_delete`) that are not constrained to video-search operations. In the context of an agent skill, this expands the skill's effective capability from narrow video retrieval into arbitrary backend interaction, which can enable unauthorized actions, data access, or misuse if higher-level inputs are influenced by users or prompts.

Description-Behavior Mismatch

Medium

Confidence: 71% confidence
Finding: The module performs automatic schema mutation on initialization by issuing ALTER TABLE against a local user table, behavior not disclosed by the skill description. Undisclosed persistent state modification is dangerous because simply loading or using the skill changes local data structures, which can surprise operators, break existing deployments, and expand the blast radius if this shared database is reused by other components.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill metadata describes video search/analysis, but this file adds a generic agent/CLI invocation path via `AgentSkill.ai_chat`, which is unrelated to that purpose and materially expands the capability surface. In agent ecosystems, hidden or unnecessary LLM/CLI execution features increase the risk of prompt injection, unauthorized tool use, and policy bypass, especially because the method accepts arbitrary prompt input and constructs an external command for another agent.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The HTTP utility contains behavior unrelated to the declared video-search purpose: it auto-provisions accounts through /sys/phoneLogin, persists returned tokens, and injects a payment/recharge upsell flow on HTTP 402. That creates hidden identity, authentication, and billing side effects in a shared helper, which can surprise users and downstream callers and broadens the attack surface well beyond simple video retrieval.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The code derives a username from ambient identity values and silently calls a phone-login endpoint with register=1 and silent=1, meaning it may create or recover an account without an explicit user action. It then stores the resulting token/openToken locally, which is dangerous because it can bind a user identity to remote services and persist credentials without clear authorization or disclosure.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The default activation rule is broad enough to match many normal video-related requests, which can cause the skill to run when the user did not intend to invoke it. In this skill, accidental activation is more dangerous because execution may save attachments, read configuration, and call remote APIs with user-linked identifiers.

Vague Triggers

Medium

Confidence: 93% confidence
Finding: The historical-report query triggers use broad phrases like viewing reports or results, making it easy to invoke cloud report listing unintentionally. Because this path requires user-linked open-id handling and retrieves potentially sensitive historical analysis data, ambiguous activation can expose prior records or metadata without sufficiently clear user consent.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The documentation instructs users to submit local videos or remote video URLs to an API endpoint without clearly disclosing that facial video data may be transmitted to a third-party or self-hosted service. Because face videos are highly sensitive biometric data, missing privacy and transmission warnings can lead users to unknowingly expose personal or regulated information.

Missing User Warnings

Low

Confidence: 83% confidence
Finding: The README says results are automatically saved to files but does not warn that analysis outputs may persist locally after execution. In this skill context, those outputs may contain biometric inferences and health-related assessments, so silent persistence increases the chance of later unauthorized access, accidental sharing, or retention beyond user intent.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The API documentation instructs users to upload videos or provide public video URLs for face analysis but omits any warning about processing sensitive biometric data, third-party video exposure, retention, transmission security, or consent requirements. This omission increases the likelihood of unsafe use and unauthorized processing of personal data, especially when facial imagery is involved.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The tool submits a local file path or remote URL for external analysis of face video without any user-facing disclosure that sensitive biometric and potentially health-related data may be processed remotely. In this skill context, that omission is more dangerous because the manifest frames the capability as ordinary video search, so users are less likely to expect or consent to off-device face/health inference.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal