Kaipai

Security checks across malware telemetry and agentic risk

Overview

This skill is a disclosed Kaipai media-processing integration with chat delivery helpers, but users should understand it uses paid API keys, network uploads, and messaging credentials.

Install only if you are comfortable giving this skill Kaipai API credentials and, when using chat delivery, access to the relevant Feishu or Telegram bot credentials. Use it for user-requested media processing and delivery back to the same intended chat; avoid passing internal/private URLs or arbitrary recipients unless you explicitly trust that flow.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (31)

Tainted flow: 'files' from requests.get (line 53, network input) → requests.post (network output)

Medium

Category: Data Flow
Content: img_bytes = r.content filename = image_source.split("?")[0].split("/")[-1] or "image.jpg" files = {"photo": (filename, img_bytes, "image/jpeg")} resp = requests.post( f"{TELEGRAM_API_BASE}/bot{token}/sendPhoto", data=data, files=files,
Confidence: 91% confidence
Finding: resp = requests.post( f"{TELEGRAM_API_BASE}/bot{token}/sendPhoto", data=data, files=files, timeout=(CONNECT_TIMEOUT, UPLOAD_

Tainted flow: 'files' from requests.get (line 53, network input) → requests.post (network output)

Medium

Category: Data Flow
Content: except Exception as exc: print(f"[telegram] Thumbnail download error: {exc}, skipping", file=sys.stderr) resp = requests.post( f"{TELEGRAM_API_BASE}/bot{token}/sendVideo", data=data, files=files,
Confidence: 90% confidence
Finding: resp = requests.post( f"{TELEGRAM_API_BASE}/bot{token}/sendVideo", data=data, files=files, timeout=(CONNECT_TIMEOUT, UPLOAD_

Lp3

Medium

Category: MCP Least Privilege
Confidence: 94% confidence
Finding: The skill declares no permissions while its instructions require shell execution, environment access, file reads/writes, and network operations. This undermines least-privilege controls and hides the true attack surface from reviewers and runtime policy, making risky capabilities easier to grant implicitly.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 90% confidence
Finding: The skill is presented as a narrow Kaipai watermark/restoration helper, but the documented behavior exposes broader capabilities including generic task execution paths, input fetching, status/history access, uploads, and outbound messaging. This mismatch can mislead operators into approving a more powerful skill than intended, increasing the chance of misuse or data exfiltration through underestimated functionality.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The skill instructs the agent to read host-side OpenClaw configuration containing Feishu credentials and then use those credentials to call Feishu APIs directly. Accessing unrelated platform secrets for a media-processing skill violates scope boundaries and creates a credential exposure and cross-system abuse path if the skill or its outputs are manipulated.

Description-Behavior Mismatch

High

Confidence: 94% confidence
Finding: This file introduces a Feishu outbound messaging capability that is not aligned with the manifest-declared purpose of Kaipai AI media restoration and watermark removal. A capability mismatch like this is dangerous because it can be used to exfiltrate processed or user-supplied images to arbitrary recipients, and users or reviewers would not reasonably expect that behavior from this skill.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The code accepts an arbitrary recipient identifier and sends an image through Feishu without visible authorization checks or purpose limitation in this wrapper. In the context of a media-processing skill, that creates a plausible exfiltration path for user content or processed results to third parties, which is more dangerous because the capability is unrelated to the advertised function.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: This script adds a Feishu video-sending capability that is outside the declared Kaipai AI media-processing purpose of the skill. In an agent environment, undeclared outbound messaging can be abused to exfiltrate user-provided media to third parties, and the mismatch between manifest and code reduces transparency and user consent.

Description-Behavior Mismatch

High

Confidence: 98% confidence
Finding: The file's actual behavior is to transmit videos via Feishu rather than perform the promised Kaipai AI processing tasks. That hidden capability is dangerous because an operator or compromised workflow could route sensitive user videos to external recipients under the guise of media processing, creating a clear data leakage channel.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The docstring explicitly states this is a Feishu wrapper, which contradicts the documented Kaipai AI processing intent and signals that the package includes functionality unrelated to its stated purpose. Such inconsistencies increase the likelihood of deceptive packaging, missed review scope, and accidental exposure of user content through undeclared channels.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The skill manifest describes a Kaipai media-processing tool, but the CLI also exposes outbound notification capabilities via Feishu/Telegram. That expands the skill from media transformation into external message delivery, which can be abused for unauthorized data exfiltration, spam, or sending processed/private media to third parties outside the declared scope. In this context, the mismatch makes the feature more dangerous because users and hosting systems may grant trust based on the narrower manifest.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The top-level help text advertises generic task usage such as txt2img, which goes beyond the four Kaipai operations declared in the skill metadata. This broadens operator expectations and may enable invocation of undeclared remote capabilities, undermining least privilege and making review of allowed behavior unreliable.

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: The resolve-input command can fetch arbitrary HTTP(S) URLs and write their content to disk, even though the skill's stated purpose is Kaipai image/video processing. This creates a generic downloader primitive that can be used for unintended network access, staging remote content locally, or bypassing tighter controls expected for a narrowly scoped media-processing skill.

Intent-Code Divergence

Medium

Confidence: 83% confidence
Finding: The help examples encourage unsupported generic task usage, contradicting the manifest's narrow description. While this is documentation-facing, it still matters operationally because users and agents often rely on help text to determine allowed behavior; misleading examples can drive use of undeclared functionality and weaken policy enforcement.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The file implements a full outbound Telegram messaging channel even though the skill manifest describes Kaipai AI media-processing operations, not external notification or messaging features. This scope expansion increases the attack surface for covert data exfiltration and unexpected third-party transmissions, making it more dangerous in this context than it would be in a messaging-focused skill.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The skill accesses TELEGRAM_BOT_TOKEN despite its declared function being media restoration/watermark removal. Reading unrelated messaging credentials broadens privilege and, combined with the notifier, enables external communication channels that users or reviewers may not expect.

Description-Behavior Mismatch

High

Confidence: 91% confidence
Finding: This file implements Telegram outbound messaging, which is outside the declared Kaipai AI skill scope of media restoration and watermark removal. In an agent environment, adding an unrelated exfiltration or notification channel is dangerous because processed user media or metadata could be sent to arbitrary chat IDs, expanding data-flow and abuse potential beyond the user-declared function.

Context-Inappropriate Capability

High

Confidence: 95% confidence
Finding: The code creates a generic outbound Telegram capability via `TelegramNotifier().send_image(...)` using attacker-controlled `--image`, `--to`, and optional caption inputs. Within this skill context, that capability is unjustified and can enable unauthorized transmission of sensitive generated outputs or local files/URLs to external recipients, making the mismatch more dangerous because the skill handles user-supplied media.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: This file implements Telegram outbound messaging, which is outside the stated Kaipai AI media-processing purpose of the skill. A capability mismatch like this is dangerous because it can be used to exfiltrate processed media or send arbitrary content to external recipients without being justified by the manifest, reducing transparency and expanding the attack surface.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The code adds an outbound communication channel via Telegram Bot API by accepting a chat_id and sending a local video file plus optional URL/caption data to an external party. In the context of a media-processing skill, this creates a covert data-transfer path that could leak user media, links, or derived outputs to unauthorized destinations, and the skill context makes it more dangerous because Telegram delivery is not part of the declared processing workflow.

Intent-Code Divergence

High

Confidence: 88% confidence
Finding: The async example presents a generic pattern for long-running tasks that uses direct SDK polling and does not mention the skill's required control path for video jobs: spawn-run-task plus sessions_spawn in the main session. In an agent setting, this can cause implementers to bypass the mandated isolation/orchestration path for expensive video processing, leading to policy violations, mis-execution, or resource abuse.

Context-Inappropriate Capability

Medium

Confidence: 86% confidence
Finding: The skill can fetch arbitrary HTTP(S) URLs and remote resources from Telegram and Feishu, which expands it from a Kaipai media-processing wrapper into a general network retrieval utility. In an agent setting, this increases SSRF-like abuse potential, internal resource access attempts, and unexpected data handling beyond the declared scope, making the skill more dangerous than its manifest suggests.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The skill manifest limits the intended capability to four specific restoration/watermark-removal tasks, but this API exposes generic txt2img and img2img helpers that enable broader image generation and transformation. In an agent setting, this creates scope expansion and policy-bypass risk because downstream code can invoke undeclared functionality without any local allowlist enforcement.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: invoke_task executes any entry present in config.INVOKE, which means the effective task surface is configuration-driven rather than constrained by the skill’s declared scope. If config is modified, extended, or attacker-influenced, the skill can submit arbitrary Kaipai jobs outside the approved four-task workflow, enabling unauthorized capability use and quota consumption.

Missing User Warnings

Low

Confidence: 76% confidence
Finding: The wrapper sends an image to an external Feishu recipient without any explicit warning or consent prompt in this interface, which can cause users to unknowingly transmit sensitive media off-platform. In a skill advertised for image/video processing, hidden network transmission is more concerning because users may assume local or service-bound processing only.

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal