Union Search Skill

Security checks across malware telemetry and agentic risk

Overview

This broad search skill should be reviewed because it can fetch and download content, use local credentials/cookies, keep search logs, and includes a command-injection-prone helper.

Install only if you are comfortable with a broad web retrieval tool. Avoid using it in sensitive directories, do not provide browser cookies or account cookies unless you understand the account access being granted, rotate/remove the hardcoded SerpAPI key before use, disable or manage local logs, and do not use the legacy Exa mcporter helper until the shell invocation is fixed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (30)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: escaped_arg = call_arg.replace('"', '\\"') cmd_str = f'mcporter call "exa.{escaped_arg}"' result = subprocess.run( cmd_str, shell=True, capture_output=True,
Confidence: 98% confidence
Finding: result = subprocess.run( cmd_str, shell=True, capture_output=True, text=True, timeout=120 )

Tp4

High

Category: MCP Tool Poisoning
Confidence: 96% confidence
Finding: This is a true security concern because the documented purpose understates materially riskier behaviors: arbitrary URL-to-Markdown fetching, media downloading via yt-dlp, RSS parsing, environment/dependency inspection, and a hardcoded SerpAPI fallback key. In an agent setting, these hidden or undocumented capabilities can be invoked under the guise of a simple search skill, enabling unreviewed outbound requests, broader content acquisition, sensitive environment disclosure, and possible secret leakage or abuse of embedded credentials.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The README documents a `download` command for video/audio retrieval, which materially expands the skill from search into content acquisition. In an agent context, this scope mismatch is dangerous because operators may grant permissions, bandwidth, storage, or trust based on a search-only description while the skill can trigger high-impact network and file operations.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: Advertising full media download capability beyond the stated search-focused purpose indicates undocumented functionality expansion. Even if intended as a feature, hidden or under-declared capability increases operational risk because it can be used to fetch and store external content outside the expected trust boundary for a search skill.

Context-Inappropriate Capability

Low

Confidence: 91% confidence
Finding: Automatic local logging and archival of full search requests and results creates unnecessary data retention for a search skill. Search queries can contain sensitive prompts, internal URLs, tokens, personal data, or research topics, and storing them in plaintext increases exposure through local compromise, backup leakage, or accidental sharing.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: The configured Exa MCP endpoint exposes tools that go beyond the skill's described unified search use case, including code-context retrieval, crawling, company/people research, and deep-research workflow functions. This creates a scope mismatch: if the agent can invoke all listed tools, it may access or collect more data than users and reviewers expect, increasing the risk of over-privileged behavior and unintended data gathering.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: This troubleshooting guidance explicitly recommends using proxies in response to access denials and anti-bot conditions, which facilitates bypassing platform restrictions rather than legitimate search functionality. In the context of a multi-platform scraping/search skill, this increases the risk that users will evade service controls and conduct unauthorized collection at scale.

Context-Inappropriate Capability

High

Confidence: 98% confidence
Finding: The document advises using proxies to avoid IP bans and anti-crawling restrictions across third-party services, which is classic access-circumvention guidance. For a search skill, this is not necessary to fulfill its stated purpose and materially lowers the barrier to abusive scraping or policy evasion.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: Telling users to use another account's cookie to access otherwise unavailable Weibo data encourages credential/account sharing to bypass platform restrictions. That crosses from troubleshooting into access circumvention and could enable unauthorized data access under someone else's session.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The module enumerates every environment variable matching SERPAPI_API_KEY* and also embeds a hardcoded fallback API credential in source. This exceeds the minimum privilege needed for a search client, creates undisclosed credential-use behavior, and hardcoded secrets can be extracted and abused by anyone with code access. In an agent skill context, automatic harvesting of multiple keys increases the risk of using unrelated operator credentials without explicit consent or auditability.

Description-Behavior Mismatch

Medium

Confidence: 88% confidence
Finding: The documented `download` command expands the skill from search and image retrieval into arbitrary media downloading via `yt-dlp`, which is a materially broader capability than the stated scope. In an agent setting, this can enable unreviewed retrieval of remote content, use of authenticated session material, and interaction with third-party platforms in ways users may not expect from a search skill.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: Automatically discovering and using cookie files from fixed local paths can silently pull authenticated browser/session credentials into downloads without explicit user approval. In an agent skill, this is especially dangerous because it broadens access to user accounts and paid/private content, and may leak or misuse local credential material outside the core search purpose.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The file exposes a generic URL-to-content extraction function via run_defuddle, which is broader than the stated skill purpose of federated search and image download. In an agent setting, this expands capability scope and can be abused to fetch and transform arbitrary remote content, increasing SSRF, data-exfiltration, and policy-bypass risk depending on how URLs are sourced.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: Allowing arbitrary URL content extraction is not necessary for a union-search adapter and materially increases the attack surface. In practice, an LLM or user could direct the skill to retrieve internal or sensitive web resources under the guise of 'content extraction,' creating a capability mismatch that is more dangerous in an agentic environment.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The CLI exposes materially broader capabilities than the stated search-oriented skill scope, including arbitrary media downloading via yt-dlp and webpage extraction to Markdown. In an agent setting, this scope expansion increases the attack surface and can enable data acquisition, policy bypass, or unintended content retrieval that callers may not expect from a 'search' skill.

Context-Inappropriate Capability

Medium

Confidence: 83% confidence
Finding: The code loads arbitrary key=value pairs from a local env file directly into process environment state, which can silently introduce credentials, proxies, or other sensitive runtime controls into the downloader. In a skill that already performs network access and invokes yt-dlp, this expands capability beyond simple search/download and can cause unintended authenticated or redirected requests using locally stored secrets.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The downloader automatically searches local filesystem paths and an environment-specified path for cookie files, then uses them for YouTube downloads without explicit per-request consent. This can cause the skill to perform authenticated access with a user's session artifacts, exposing private account context and exceeding the expected scope of a general search/download tool.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: Supporting --cookies-from-browser allows the tool to direct yt-dlp to extract cookies from a browser profile, enabling authenticated requests against the user's accounts. In this skill context, that is materially more dangerous because it grants access to local browser session state rather than just downloading public content.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The skill's intended purpose is search, but it implements that by launching an external shell command, expanding the attack surface beyond what the functionality requires. In this context, the wrapper accepts arbitrary query strings from users and passes them into a shell-mediated tool invocation, making misuse and injection materially more dangerous than a direct API call would be.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The tool automatically loads environment variables from a user-selected file, defaulting to .env, even though RSS searching does not appear to require secrets. In an agent or automation context, this can unnecessarily ingest local secret material into the process and make it available to later code paths, logs, subprocesses, or accidental disclosure if the skill is invoked in a sensitive working directory.

Description-Behavior Mismatch

Medium

Confidence: 82% confidence
Finding: The skill includes URL-reading/content-extraction behavior beyond simple search aggregation, which materially expands its attack surface into server-side fetching of arbitrary URLs. In an agent context, this can enable SSRF-style access to internal services, cloud metadata endpoints, or sensitive intranet resources if URL fetching is not strictly constrained.

Context-Inappropriate Capability

Medium

Confidence: 96% confidence
Finding: The skill automatically logs search queries and aggregated results to local storage without the user explicitly opting in. Search terms and fetched results can contain sensitive personal, corporate, or investigative data, so automatic archival creates an unnecessary data-retention and privacy risk.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The async extractors perform outbound fetches to third-party services such as publish.twitter.com, api.fxtwitter.com, and old.reddit.com based on user-supplied URLs. In an agent skill that is expected to convert URLs/content, this creates an unannounced external transmission path and can leak user targets, browsing intent, or sensitive internal URLs to external services, especially if the skill is run in trusted or enterprise contexts.

Context-Inappropriate Capability

Low

Confidence: 92% confidence
Finding: The script searches for VOLCENGINE_API_KEY in multiple .env files, including the current working directory, which expands credential discovery beyond the tool's own configuration scope. In an agent or automation context, an attacker who can influence the working directory or place a malicious .env file nearby could cause the tool to silently consume unintended credentials and use them for outbound requests.

Context-Inappropriate Capability

Medium

Confidence: 97% confidence
Finding: The script intentionally rotates among 20 User-Agent strings and explicitly states this is to avoid a fixed UA, while other logic acquires cookies and retries requests to bypass anti-scraping controls. In the context of an agent skill that provides automated cross-platform search, this increases the capability for stealthy scraping and terms-of-service evasion, which makes the skill more dangerous than a normal search wrapper.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal