Bilibili Video Transcriber

Security checks across malware telemetry and agentic risk

Overview

The skill is a real Bilibili transcription tool, but it handles account cookies and external Feishu/Lark sharing with enough automatic behavior and weak scoping that users should review it carefully before installing.

Install only if you are comfortable giving this skill access to Bilibili session cookies and with transcripts, video metadata, and comments potentially being sent to Feishu/Lark when lark-cli is configured. Before use, remove the bundled cookie files, use a low-privilege Bilibili account, verify all cookie storage paths, and disable or patch out automatic Lark document creation if you want local-only transcription.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (39)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
if wiki_space:
                cmd.extend(['--wiki-space', wiki_space])
            
            result = subprocess.run(
                cmd,
                input=md_content.encode('utf-8'),
                capture_output=True,
Confidence
90% confidence
Finding
result = subprocess.run( cmd, input=md_content.encode('utf-8'), capture_output=True, timeout=30 )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
# 先安装基础依赖
        cmd = [sys.executable, "-m", "pip", "install"] + base_dependencies
        print(f"   安装基础依赖: {' '.join(base_dependencies[:3])}...")
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
Confidence
95% confidence
Finding
result = subprocess.run( cmd, capture_output=True, text=True, timeout=300 )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
else:
            cmd = [sys.executable, "-m", "pip", "install"] + faster_whisper_deps
            print(f"   安装 faster-whisper...")
            result = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
Confidence
95% confidence
Finding
result = subprocess.run( cmd, capture_output=True, text=True, timeout=600 )

subprocess module call

Medium
Category
Dangerous Code Execution
Content
# 测试帮助命令
            import subprocess
            result = subprocess.run(
                [sys.executable, str(cli_path), "--help"],
                capture_output=True,
                text=True,
Confidence
76% confidence
Finding
result = subprocess.run( [sys.executable, str(cli_path), "--help"], capture_output=True, text=True, timeout=10 )

Tainted flow: 'audio_url' from requests.get (line 27, network input) → subprocess.run (code execution)

Critical
Category
Data Flow
Content
# 下载音频
        audio_file = os.path.join(output_dir, "raspberry_zero.mp3")
        print(f"⬇️  下载音频中...")
        subprocess.run([
            "yt-dlp", 
            "-x", "--audio-format", "mp3",
            "-o", audio_file,
Confidence
89% confidence
Finding
subprocess.run([ "yt-dlp", "-x", "--audio-format", "mp3", "-o", audio_file, audio_url ], check=True, timeout=180)

Description-Behavior Mismatch

Medium
Confidence
91% confidence
Finding
The README documents built-in capabilities beyond subtitle transcription, including hot-comment scraping and QR/login cookie management, while the stated skill description emphasizes subtitle handling. This mismatch is dangerous because users may grant trust, install, or run the skill without understanding that it can collect comments and manage persistent authentication material, increasing the chance of unexpected data access or credential exposure.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The documented login flow includes generating a QR code and sending it via Feishu, which introduces an external transmission channel unrelated to basic transcription. This expands the trust boundary to a third-party messaging platform and creates a path for account takeover or session theft if the QR code, login state, or associated metadata is intercepted, misrouted, or retained.

Description-Behavior Mismatch

Medium
Confidence
90% confidence
Finding
The release notes describe a new capability to automatically create Feishu/Lark documents, which expands the skill from local subtitle/transcription processing into external content publishing. That change introduces data exfiltration and unintended third-party side effects that are not clearly aligned with the stated skill scope, so users may trigger outbound publication without fully understanding it.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
Calling `lark-cli` to publish documents gives the skill an outbound integration capable of sending transcribed content to an external platform. In a video transcription skill, that is a context-expanding capability; if not explicitly declared and consented to, sensitive transcript data, comments, or summaries could be pushed to an external workspace unexpectedly.

Description-Behavior Mismatch

Medium
Confidence
82% confidence
Finding
The skill documentation broadens scope into Feishu document creation and wiki publishing, which is materially different from subtitle extraction. Publishing transcriptions or summaries to enterprise knowledge bases can exfiltrate user or video-derived data to another platform, especially if done without explicit authorization and destination controls.

Description-Behavior Mismatch

Medium
Confidence
87% confidence
Finding
The skill collects hot comments and replies, which exceeds the stated subtitle/transcription/content-analysis scope. This broadens data collection beyond user expectations and may capture additional personal or sensitive user-generated content unnecessarily.

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The skill creates Feishu/Lark documents containing processed content and metadata, but this capability is not disclosed in the manifest description. Hidden outbound publishing materially changes the trust boundary and can expose transcripts, comments, and video details to an external platform.

Context-Inappropriate Capability

High
Confidence
95% confidence
Finding
Invoking lark-cli to create a remote document is outside the core subtitle/transcription purpose and introduces a networked side effect with third-party data sharing. In a transcription skill, that extra capability is more dangerous because users would not reasonably expect their content to be uploaded elsewhere.

Description-Behavior Mismatch

Medium
Confidence
91% confidence
Finding
The CLI exposes functionality outside the stated subtitle/transcription scope by surfacing comment retrieval and Feishu delivery features. Scope expansion matters in agent skills because users may grant trust and credentials for one purpose, while the tool also accesses and distributes additional data, increasing privacy and data-exfiltration risk.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
The cookie inspection command prints account email and mobile information, which is unrelated to transcription and constitutes unnecessary exposure of sensitive personal data. In shared terminals, logs, screenshots, or agent-mediated environments, this can leak PII without a strong operational need.

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
The module adds persistent cookie management and QR-based account login flows that materially exceed the advertised subtitle/transcription function of the skill. This creates an unnecessary authentication surface and enables collection, storage, and reuse of a user's Bilibili session tokens, which could be abused for account access if the skill or environment is compromised.

Context-Inappropriate Capability

Medium
Confidence
87% confidence
Finding
The code introduces a signaling path for Feishu-based QR delivery despite the skill being described as a subtitle/transcription tool. Even though this implementation only writes a local signal file, it is clearly designed to trigger external transmission of login artifacts, expanding the data-flow and increasing the risk of unintended disclosure or misuse.

Context-Inappropriate Capability

Medium
Confidence
70% confidence
Finding
The skill introduces system-level execution of yt-dlp and whisper, which materially expands the attack surface beyond simple API-based subtitle handling. In a skill environment, invoking external binaries on untrusted remote media increases exposure to binary supply-chain issues and media parser vulnerabilities, especially when not sandboxed or tightly justified.

Context-Inappropriate Capability

Medium
Confidence
84% confidence
Finding
This test routine both changes local file permissions and executes the CLI script, which are side-effecting host operations not strictly necessary for subtitle/transcription functionality. In a skill package, installation-time behavior that mutates files and runs code increases risk because users may not expect these actions from a setup verification script.

Context-Inappropriate Capability

Medium
Confidence
88% confidence
Finding
The script reads `~/.bilibili_cookie.txt` from the user's home directory, which accesses sensitive authentication material outside the package directory. Even though it only checks for presence/content, probing credential files in an installation test is broader access than necessary and normalizes handling of private tokens.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The document advertises QR-code login and a 1-year cookie lifetime without any warning about the sensitivity of session credentials or the risks of retaining them. In a skill whose core purpose is transcription, introducing long-lived authenticated sessions increases the blast radius of compromise and may expose users to account takeover if cookies are stored, logged, or mishandled.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The release summary states that videos longer than 20 minutes are forcibly sent to Whisper online, but it does not disclose that user audio/content may leave the local environment for third-party processing. This creates a privacy and data-handling vulnerability because users may reasonably expect local transcription, especially when the same document promotes offline processing for privacy-sensitive scenarios.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The README promotes automatic cookie handling, redundant storage, and persistent saving without clearly warning that cookies are sensitive authentication credentials. This is dangerous because users may unknowingly store long-lived session material in multiple locations, increasing the attack surface and the likelihood of account compromise from local disclosure or mishandling.

Missing User Warnings

High
Confidence
97% confidence
Finding
The README says a generated login QR code may be sent via Feishu without a clear warning that authentication material is being transmitted to an external service. This is particularly risky because QR login artifacts can enable account access; sending them off-device introduces unnecessary exposure, possible interception, and user confusion about where their authentication data is going.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The skill describes automatically generating login QR codes and sending them via Feishu when cookies expire, without explicit warning or opt-in for transmitting authentication-related artifacts off-platform. Even if the QR code is intended for the user, routing login material through another service increases exposure, logging, and misdelivery risks.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal