Back to skill

Security audit

YouTube Comment Moderator

Security checks across malware telemetry and agentic risk

Overview

This is a coherent YouTube moderation skill, but it needs Review because it can post and delete channel comments using locally stored OAuth credentials and does not fully disclose or constrain that power.

Install only if you are comfortable granting YouTube comment write access. Start with monitor or dry-run mode, review the queue before approving deletions or replies, avoid full auto and cron until results are proven reliable, paste only the OAuth code when possible, and protect or delete oauth.json, config.json, .env, and the SQLite database when no longer needed.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (23)

Intent-Code Divergence

Medium
Confidence
97% confidence
Finding
The setup text says approval mode requires user approval before posting, but the saved configuration also enables automatic spam deletion in approval mode. That mismatch can cause destructive actions the operator did not consent to, especially in a moderation tool that can delete public comments.

Intent-Code Divergence

Medium
Confidence
93% confidence
Finding
The OAuth flow says users can skip authorization for read-only mode, but elsewhere approval mode still sets auto_delete_spam true. This creates a misleading safety boundary where users may believe writes are disabled, while configuration semantics still request destructive behavior once credentials exist or are later added.

Intent-Code Divergence

Medium
Confidence
96% confidence
Finding
The setup text says approval mode requires manual approval before posting, but the generated config enables automatic spam deletion in approval mode via `auto_delete_spam: mode in ("auto", "approval")`. That mismatch can cause users to authorize a less destructive workflow than they actually receive, leading to unintended deletion of legitimate comments.

Vague Triggers

Medium
Confidence
79% confidence
Finding
The invocation language is broad enough to trigger this skill for many generic moderation, sentiment, or pipeline-building requests, even when the user did not clearly intend to authorize YouTube API access or automated moderation actions. In context, this is more dangerous because the skill can progress from analysis to live replies/deletions using stored credentials, increasing the chance of unintended activation on a real channel.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The skill promotes automatic deletion and live moderation without a prominent, upfront warning that these actions can irreversibly affect user content and channel interactions. In this context, the danger is elevated because the workflow includes an 'auto' mode that can delete comments and post replies at scale once OAuth is configured, making accidental or misunderstood execution materially harmful.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The guide tells users to place an API key, OAuth client ID, and OAuth client secret in a `.env` file but does not warn them to protect that file, exclude it from version control, or avoid sharing it. In a real setup, `.env` files are commonly committed accidentally, which can expose credentials that enable unauthorized API use or aid further compromise of the YouTube moderation integration.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The script sends raw comment author names and comment text to a third-party Gemini API for classification, which is a real data-exposure/privacy issue because user-generated content may contain personal or sensitive information. The risk is heightened by the skill context because moderation pipelines may process large volumes of comments automatically, and the script provides no built-in notice, consent mechanism, redaction, or configuration to keep processing local.

Missing User Warnings

Medium
Confidence
83% confidence
Finding
The script stores raw comment text together with author display names and channel identifiers to disk without any privacy notice, minimization, retention control, or access restriction. In a moderation context, this can accumulate personally identifiable or sensitive user-generated content, increasing the risk of unauthorized disclosure or secondary misuse if the file is accessed by other local users, tools, or logs.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
Raw YouTube comments, author names, and channel context are sent to Gemini for classification, which transmits third-party content to an external AI service. In a moderation tool this is contextually relevant functionality, but without explicit operator notice/consent, data-minimization, or privacy controls, it creates a real privacy and compliance risk.

Missing User Warnings

Medium
Confidence
87% confidence
Finding
The non-interactive exchange path silently persists OAuth access and refresh tokens to disk. Storing long-lived tokens without prominent disclosure or permission hardening increases the chance of credential leakage from local filesystem exposure, backups, or multi-user environments.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The generated config stores the YouTube API key directly in config.json without a clear warning. Writing credentials into a general configuration file increases accidental disclosure risk through source control commits, logs, support bundles, or permissive file sharing.

Vague Triggers

Medium
Confidence
87% confidence
Finding
The description invites use with broad phrases like moderating comments, analyzing sentiment, and managing spam without clear guardrails or confirmation thresholds. In an agent ecosystem, this can cause the skill to be invoked in loosely related situations and escalate into actions such as replying to or deleting comments with insufficient user intent verification.

Missing User Warnings

High
Confidence
97% confidence
Finding
The skill prominently advertises automatic deletion of comments but does not present a clear user-facing warning that this is a destructive, potentially irreversible action. Because the skill also supports auto mode and deletion of spam/hate, an incorrect classification or accidental invocation could remove legitimate audience comments at scale and harm user trust or channel operations.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
Asking the user to paste the full OAuth redirect URL back into the system can expose sensitive authorization data such as codes, state values, and account-linked parameters if logs, transcripts, or other tools retain the message. In this skill's context, that data could be exchanged for channel-management tokens, making the exposure more sensitive than a generic URL-sharing pattern.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The guide tells users to create an API key and OAuth client secret, place them in a .env file, and run setup, but it provides no warning to keep these credentials private, avoid committing them to source control, or protect oauth.json tokens. In a skill that can reply to and delete YouTube comments, leaked credentials or tokens could enable unauthorized access to channel data and moderation actions.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The script sends raw comment text and author handles to the external Gemini API for classification, but provides no notice, consent flow, masking, or configuration guard beyond requiring an API key. This creates a real data exposure risk because user-generated content and associated identifiers are transmitted to a third party, which may violate privacy expectations, internal data-handling rules, or platform compliance requirements.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The classifier sends raw YouTube comment text plus author display names to Gemini, which is a third-party AI service, without any in-code consent gate, minimization, or warning at the point of transmission. This can expose personal data and sensitive user-generated content to an external processor, creating privacy, compliance, and trust risks even if the network call is functionally intended.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
Reply drafting sends the commenter name and full comment text to Gemini to generate a response, again disclosing user content to an external AI provider without an explicit consent or warning control. Because this path may process arbitrary public comments at scale, the skill context increases the likelihood of transmitting personal data, abuse content, or other sensitive material to a third party.

Missing User Warnings

Medium
Confidence
85% confidence
Finding
The non-interactive OAuth exchange writes access and refresh tokens to `oauth.json` on disk without warning the operator or applying any permission hardening. On multi-user systems or misconfigured environments, local credential files can be exposed, enabling unauthorized access to YouTube account actions.

Ssd 4

Medium
Confidence
91% confidence
Finding
User-supplied comment text is concatenated directly into the same prompt stream as the classifier instructions, so adversarial comments can attempt prompt injection such as telling the model to ignore prior instructions or alter output formatting. In this skill, model output drives moderation actions like delete, reply, and flag_review, so prompt manipulation could cause misclassification and unsafe downstream actions at scale.

Ssd 1

Medium
Confidence
96% confidence
Finding
Untrusted user comments are inserted directly into the LLM prompt used for classification without a strong boundary telling the model to treat comment text purely as data. A malicious commenter can embed prompt-injection content to bias moderation decisions, potentially causing spam to evade deletion or benign comments to be misclassified and removed.

Ssd 1

Medium
Confidence
97% confidence
Finding
The reply-generation prompt embeds attacker-controlled comment text in the same instruction context as the system task. A malicious commenter can inject instructions that manipulate the drafted reply, causing embarrassing, policy-violating, or brand-damaging responses that may then be auto-posted via OAuth.

Ssd 4

Medium
Confidence
97% confidence
Finding
Untrusted comment text is concatenated directly into the same prompt as the classifier instructions, so a crafted comment can include prompt-injection content that biases or subverts downstream classifications. Because the model is asked to process many comments in one instruction stream, one malicious comment can potentially affect classification results for other comments in the batch, leading to bad moderation actions such as failing to delete spam or misclassifying legitimate users.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal