Byted Kickart Subtitle Extractor

Security checks across malware telemetry and agentic risk

Overview

This skill mostly matches a cloud subtitle-extraction workflow, but it asks for sensitive cloud credentials in chat and includes under-scoped account, logging, and update behavior that users should review carefully.

Install only if you are comfortable sending videos and media metadata to the Volcengine/ByteDance cloud service and using cloud credentials with this skill. Prefer platform-provided API tokens over long-lived AK/SK secrets, do not paste secrets into ordinary chat, review the account registration and upgrade-command behavior, and clear local /tmp media caches and logs after use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (28)

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The skill explicitly instructs the agent to ask users to paste ACCESS_KEY_ID and SECRET_ACCESS_KEY into chat and then export them into the runtime environment. Requesting long-lived cloud credentials in-band is highly dangerous because chat logs, tool traces, or downstream scripts may expose or misuse those secrets beyond the narrow subtitle task.

Context-Inappropriate Capability

Medium

Confidence: 84% confidence
Finding: Mandatory package/plan lookup is unrelated to the core act of extracting subtitles and expands the skill's access to account-level data and remote services. Unnecessary coupling of billing/account inspection with content processing violates least privilege and increases the attack surface.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The skill can query a remote upgrade endpoint, receive an install_command, and execute it if the user agrees. Executing remotely supplied installation commands creates a direct remote-code-execution and supply-chain risk, especially because the command origin and integrity are not independently verified.

Description-Behavior Mismatch

High

Confidence: 96% confidence
Finding: The implementation does not appear constrained to subtitle extraction despite the skill being described as a video subtitle tool. It submits generic AI template jobs using caller-controlled service/template identifiers and arbitrary extra parameters, creating a functionality mismatch that could route user data to unrelated remote services or enable unintended capabilities beyond the declared skill scope.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The post(action, params) method acts as a generic proxy to the ICCP client and accepts an arbitrary action string and raw bytes payload. In a skill advertised for subtitle extraction, this broad remote invocation surface can be abused to call unrelated backend operations, bypass feature scoping, and potentially access or manipulate remote services with the skill's configured credentials.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This file implements broad media upload, creation, polling, and formatting functionality that goes well beyond a subtitle-extraction skill’s declared purpose. In a skill that should only extract subtitles from user-provided videos, hidden media-management capabilities increase the attack surface and create opportunities for unauthorized upload, storage, or processing of user media in backend systems.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The IAM service code enumerates users and retrieves an admin or fallback user ID, which is unrelated to subtitle extraction and can facilitate privilege misuse. In this context, account discovery is especially risky because it can be combined with media operations to act on behalf of privileged identities without clear user intent or disclosure.

Context-Inappropriate Capability

Medium

Confidence: 84% confidence
Finding: The formatter supports image and audio media flows even though the skill is described as a video subtitle extractor. While not automatically exploitable on its own, this unnecessary modality expansion broadens reachable code paths and may enable unintended processing or exfiltration of non-video assets.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: This code implements a full remote media upload flow to a Muse service, including hashing local files, querying upload state, streaming file contents, creating remote materials, and polling for remote metadata. For a skill whose declared purpose is subtitle extraction/export, this is unnecessary privilege and data movement that can exfiltrate user-provided media to an external service without clear in-scope justification.

Context-Inappropriate Capability

High

Confidence: 96% confidence
Finding: The uploader is general-purpose and classifies arbitrary files as image or video before sending them to a remote material service, which exceeds the narrow function of extracting subtitles from a video. In the context of this skill, that mismatch is dangerous because it creates an undeclared capability to transmit user media externally, increasing the risk of privacy violations and misuse.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The repository and cache layers persist uploaded media records and per-media JSON metadata on local disk, including paths, IDs, serialized material data, and timestamps. For a subtitle-extraction skill, retaining an upload registry and cache broadens the data footprint beyond immediate task execution and can expose sensitive media metadata or enable later reuse without user awareness.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The skill advertises local subtitle extraction from a video file, but the implementation retrieves a remote media URL from a backend service and submits it to another external processing service. This creates a security and privacy mismatch: users may believe processing is local when their media location and content are actually sent over the network to third-party/backend systems.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The code relies on privileged backend capabilities—media lookup by ID and remote job submission—that exceed what a simple 'extract subtitles from video file' tool would normally need. That broader access increases the chance of unauthorized data access or unintended processing of media beyond the user's expectations, especially in an agent skill context where users may not understand backend data flows.

Description-Behavior Mismatch

High

Confidence: 99% confidence
Finding: The file’s executable behavior does not perform subtitle extraction at all; instead it instantiates an external service client and invokes a registration action named 'RegisterArkClawCombo'. In the context of a skill advertised as video subtitle extraction, this is a strong capability mismatch that could cause unauthorized external account/package registration or abuse user trust to trigger unrelated remote operations.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The inline comment and docstring explicitly state that the command queries/registers a free Ark Claw package, directly contradicting the declared subtitle-extraction functionality. This kind of misleading labeling increases the risk of deceptive execution, because reviewers or users may believe they are invoking media processing while the code performs an unrelated external action.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: This script performs remote task submission and polling through an ICCP service, which is outside the stated subtitle-extraction functionality of the skill. Even if intended as an internal upgrade helper, it introduces undocumented networked behavior that could trigger external actions, leak metadata, or expand the trust boundary without user awareness.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The code adds a network-capable remote-management path by submitting a job to an external service and handling asynchronous results, but this capability is not justified by the skill's advertised purpose. In a subtitle extraction skill, unexpected remote orchestration is riskier because users and reviewers would not expect off-box task control or hidden operational dependencies.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The script’s implemented behavior is to upload a user-supplied local video to a media service and return a media asset identifier, which materially differs from the skill’s stated purpose of extracting subtitles from a video. In a skill context, this mismatch is dangerous because users may provide sensitive local media expecting local subtitle extraction, while the tool silently performs external transmission/registration instead, creating privacy, consent, and data-handling risk.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The docstring explicitly states that the tool uploads a local video file and obtains a media ID, contradicting the advertised subtitle-extraction purpose. This inconsistency increases the chance that operators or downstream agents invoke the tool under false assumptions, leading to unexpected disclosure of local media to remote infrastructure.

Missing User Warnings

High

Confidence: 99% confidence
Finding: The skill tells users to send ACCESS_KEY_ID and SECRET_ACCESS_KEY directly in chat, without warning them not to share secrets in conversation or explaining retention risks. This is dangerous because conversational channels are commonly logged, replayed, and accessible to tooling, making credential compromise far more likely.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The skill downloads arbitrary public URLs to local storage and then processes the resulting file, but does not disclose the network fetch and file-write behavior or constrain the source. This can enable SSRF-like internal fetches, retrieval of maliciously large files, or unexpected local persistence when users believe they are only providing a link for subtitle extraction.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The request log includes the full URL, headers, and raw body, which exposes Authorization material and potentially sensitive user content sent for subtitle extraction. If logs are accessible to operators, support systems, or other services, this can leak bearer tokens, signed request metadata, and private video-derived data, enabling replay, abuse, or data disclosure.

Missing User Warnings

Medium

Confidence: 99% confidence
Finding: The request logging records full HTTP headers and request bodies, which includes bearer tokens or signed authorization material and potentially sensitive media metadata. If logs are accessible to operators, other services, or attackers, this can lead to credential theft, request replay, and exposure of user data.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The upload flow sends local file data and derived metadata such as hashes, file size, filename components, and ownership context to a remote service, but this file contains no indication of consent, notice, or disclosure controls. In a subtitle-extraction skill, silent transfer of user media is particularly risky because users may reasonably expect local processing rather than remote ingestion.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill sends the video's URL and duration to a remote service without any user-facing warning or consent mechanism. In a subtitle-extraction skill, this is particularly risky because users are likely to assume their video is processed locally; silent network transfer can expose sensitive media metadata or content to external systems.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal