Doubao Image Video Skill

Security checks across malware telemetry and agentic risk

Overview

This is mostly a Doubao API wrapper, but it needs review because watermark/logo removal is made a default and recommended workflow without adequate authorization or privacy guardrails.

Install only if you are comfortable sending prompts, image URLs, and task metadata to Volcengine ARK under your own API key. Use a limited key, avoid printing the full key in terminals or logs, and do not use the edit workflow to remove watermarks, logos, attribution, or ownership marks unless you own the content or have explicit permission. Verify the edit action before relying on it, because the advertised edit capability does not appear to be implemented in the shipped backend script.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (29)

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: The skill documentation recommends invoking local Python scripts through subprocesses for functionality that could be implemented directly via an SDK or module import. Subprocess-based designs increase attack surface through path confusion, command misuse, accidental execution of tampered local files, and harder-to-audit data flows involving secrets like API keys.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The documentation explicitly promotes image editing for removing watermarks, which facilitates bypassing attribution and ownership markings on third-party content. In the context of a generic media-generation skill, this is a misuse-enabling capability rather than a clearly legitimate core function, and it increases risk of copyright infringement and policy abuse.

Context-Inappropriate Capability

Medium

Confidence: 98% confidence
Finding: The API reference defines the edit action as AI-based watermark removal, normalizing a rights-bypassing use case in official docs. That materially increases the likelihood that downstream agents or users will invoke the skill for infringing or deceptive purposes.

Context-Inappropriate Capability

Medium

Confidence: 99% confidence
Finding: The prompt guidance and examples repeatedly suggest removing watermarks, logos, text, and overlays, which operationalizes misuse and lowers the barrier to abuse. In a skill that directly interfaces with image-editing APIs, such guidance makes harmful use substantially easier.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The CLI help explicitly promotes watermark removal as a normal use case, which steers users toward copyright/circumvention-oriented misuse rather than neutral image editing. In this skill context, the danger is increased because the tool is specifically designed to send user prompts to a media-generation/editing backend, so the documentation materially enables misuse rather than merely mentioning it abstractly.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The inline documentation frames image editing as including watermark removal, normalizing a misuse-oriented action and implicitly encouraging policy-violating requests. Because this is end-user-facing CLI guidance, it directly influences operator behavior and makes abusive use more likely.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The code defaults the edit prompt to 'remove watermark, keep main content', meaning the tool will attempt a circumvention-oriented transformation even when the user provides no explicit edit instruction. This is more dangerous than documentation alone because the insecure behavior is hardcoded into runtime behavior and can trigger misuse by default.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The image-edit feature explicitly documents and defaults to watermark removal, which promotes copyright and attribution circumvention through an otherwise benign editing API. In this skill context, that is more dangerous because the capability is normalized as a first-class supported workflow rather than a user-supplied edge case.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The README explicitly promotes image editing for '去除水印' (watermark removal) without any warning about copyright, authorization, or policy restrictions. In a generative media skill, this normalizes a misuse case that can facilitate infringement, provenance removal, or circumvention of ownership controls, making the documented capability materially risky in context.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The documentation explicitly demonstrates watermark removal and provides operational prompts for doing so, without any warning about copyright, ownership, or misuse risks. This materially enables rights-evasion behavior and normalizes a use case that can facilitate unauthorized redistribution or tampering with provenance marks.

Missing User Warnings

Low

Confidence: 78% confidence
Finding: The troubleshooting section encourages displaying API key values on screen with commands like echoing the environment variable, but does not warn that this may expose secrets via terminal history, shoulder-surfing, logs, or shared sessions. While not as severe as exfiltration code, it is still insecure secret-handling guidance.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The guide instructs users to set an API key and send prompts, image URLs, and task identifiers to ByteDance/Doubao without any disclosure that this data leaves the local environment and is processed by a third-party service. In an agent skill context, this omission can cause users to unknowingly transmit sensitive prompts or private media references to an external provider.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The documentation explicitly promotes watermark removal as a supported feature without warning about copyright, ownership, platform policy, or consent implications. This normalizes a potentially unlawful or abusive use case and increases the likelihood that downstream users employ the skill for rights-stripping or deceptive content manipulation.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README tells users to export and echo the API key, which encourages credential exposure in terminal history, logs, screenshots, and shared shells without any warning about secrecy. While this is documentation rather than executable theft logic, it normalizes unsafe secret-handling practices and can lead to accidental compromise of the ARK_API_KEY.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The quick-start examples instruct users to submit prompts, image URLs, and media-edit requests to the ByteDance/Doubao remote API without clearly warning that this data leaves the local environment and is processed by a third party. This creates privacy and compliance risk, especially for sensitive prompts, internal image URLs, or proprietary media, and the example explicitly includes watermark/logo removal, a higher-risk use case.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The documentation presents watermark removal as a normal feature without any warning about copyright, ownership, or legal restrictions. Omitting those warnings in a skill doc can encourage unauthorized use and increases compliance and abuse risk.

Natural-Language Policy Violations

High

Confidence: 96% confidence
Finding: The CLI both advertises and operationalizes watermark removal without any warning, restriction, or compliance gate. In a media-editing skill, that materially lowers the barrier to copyright and attribution circumvention, increasing the likelihood of abusive use.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: User-supplied prompts are transmitted to a third-party API, but the script provides no user-facing disclosure, consent flow, or warning that prompt contents leave the local environment. If users enter sensitive data, that information may be exposed to an external service and retained or processed under that provider's policies.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The video-generation path sends raw user prompts to an external API without any visible disclosure or privacy notice. In a skill context, users may reasonably assume local processing, so silent exfiltration of prompt contents can expose confidential or regulated information.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The code silently defaults the edit prompt to `remove watermark, keep main content`, enabling a harmful transformation without explicit user acknowledgment or warning. This increases the risk of misuse because a caller can trigger watermark removal simply by invoking the edit action with an image URL and no prompt.

Ssd 4

Medium

Confidence: 98% confidence
Finding: The skill gives a structured workflow for removing watermarks and logos, including suggested prompts and parameter tuning to improve success. This is dangerous because it meaningfully assists repeated attempts to defeat ownership or attribution markings rather than merely mentioning a neutral image-editing capability.

Ssd 4

Medium

Confidence: 97% confidence
Finding: The troubleshooting and test sections reinforce the watermark-removal workflow by encouraging retries, stronger prompts, and parameter adjustments until the mark is gone. This repetition increases practical misuse by turning a questionable capability into an optimization guide for bypassing protective markings.

Ssd 4

Medium

Confidence: 96% confidence
Finding: Presenting 'remove watermark' as a standard workflow makes potentially abusive image modification look endorsed by the skill author. In a general-purpose image editing skill, this context makes the behavior more dangerous because the documentation lowers friction for misuse rather than constraining it.

Ssd 2

Medium

Confidence: 97% confidence
Finding: The docs paraphrase watermark/logo removal as an intended editing function, which meaningfully promotes removal of attribution or protective marks. That creates misuse risk even if phrased as a benign editing example.

Ssd 2

Medium

Confidence: 98% confidence
Finding: The suggested prompts provide concrete wording for removing attribution-related elements, making it easy for users or agents to repurpose the tool for content laundering. This is especially risky because prompt examples directly shape end-user behavior.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal