Security audit

Agentic Engineering

Security checks across malware telemetry and agentic risk

Overview

This skill is mostly a disclosed agent-engineering toolkit, but it bundles broad scraping, local persistence, hardcoded messaging, and exposed credential/session details that need manual review before use.

Install only if you are comfortable reviewing and editing it first. Remove the exposed keys/JWT-shaped examples and personal paths, replace the hardcoded Feishu recipient, restrict or disable scraping unless you need the AI market report, and avoid running the AutoGen Studio 0.0.0.0 command without a protected network setup.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (20)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill advertises no explicit permissions while its content and linked branches imply access to shell, network, environment data, and file writing. That mismatch weakens user consent and review controls, making it easier for a broadly triggered skill to perform sensitive actions unexpectedly.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The declared purpose is a generic agent-engineering framework, but the analyzed behavior includes scraping, polling external endpoints, local persistence, report generation, and pushing messages to a hardcoded recipient. This hidden expansion of capability is dangerous because users may invoke the skill for architecture guidance while unknowingly enabling data collection, outbound communications, and stateful automation.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: This file materially diverges from the declared skill purpose: instead of agent-engineering orchestration, it implements vendor intelligence, pricing aggregation, ranking, and report generation. In a skill ecosystem, that mismatch is dangerous because users and reviewers may grant the skill broader trust, permissions, or deployment context based on the manifest while hidden data-processing behavior performs unrelated collection and analysis.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill contains vendor/news intelligence and pricing-analysis logic that is not justified by the stated agent-engineering use case. Even without obvious code execution risk, hidden or unrelated collection/analysis capabilities expand the skill's operational scope, increasing the chance of unauthorized data use, user surprise, and risky permissioning in environments that trust the manifest.

Intent-Code Divergence

Medium

Confidence: 86% confidence
Finding: The report claims pricing is computed from real-time or scraped actual prices, but the code silently substitutes manually maintained fallback values. This is dangerous because it can mislead downstream decision-making and mask stale or fabricated inputs, especially in a tool that presents rankings as authoritative market intelligence.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: This file performs automated collection of data from a dozen external third-party sites using a headless browser, which materially expands the skill's network reach beyond its stated purpose of agent-engineering guidance. In an agent skill context, undisclosed broad web access is risky because it can trigger unintended outbound requests, collect arbitrary site content, and create compliance, privacy, and supply-chain exposure if the skill is invoked in environments that did not expect live scraping.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: This code performs outbound scraping of third-party news and pricing pages and collects page text into structured results, which is materially broader than the declared skill purpose of agent-engineering coordination and architecture choice. In a skill context, hidden or unjustified network collection increases supply-chain risk, can exfiltrate browsing-derived data, and may violate user expectations or organizational policy even if the targets are public websites.

Context-Inappropriate Capability

Low

Confidence: 89% confidence
Finding: The script writes scraped vendor content and metadata to local JSON reports without any retention controls, user approval flow, or minimization beyond simple truncation. While this is not remote code execution, it creates an unjustified local data-harvesting/reporting capability that can accumulate sensitive or policy-restricted content over time.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The file implements broad web-scraping of third-party pricing and news pages, which is materially different from the declared 'agent-engineering' orchestration purpose of the skill. In an agent ecosystem, this kind of capability mismatch is dangerous because it can enable undeclared external data collection, policy bypass, or surprise network activity under a misleading skill description, reducing operator oversight and increasing supply-chain risk.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The comment says new vendors are not automatically added, but scan() actually calls get_or_create() for names derived from external search results and then persists them to vendor_registry.json. This mismatch can cause operators to trust the scanner as non-mutating for unknown entities when it will in fact accept and store attacker-influenced names, enabling registry poisoning and downstream workflow manipulation if later components trust this registry.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The skill document exposes concrete secrets and sensitive infrastructure details, including API keys, a JWT, local database locations, and auth configuration paths. Even though some values are partially redacted, this materially increases the risk of credential misuse, lateral movement, and targeted compromise of the local AutoGen Studio environment, and it is not necessary for a composing skill's stated purpose.

Context-Inappropriate Capability

Medium

Confidence: 87% confidence
Finding: The skill includes administrative details about local AutoGen Studio infrastructure, such as a concrete database glob path and MCP script location, which exceed the declared creative-generation function. This expands the attack surface by disclosing where operational components and state are stored, making follow-on targeting and unauthorized manipulation easier.

Vague Triggers

Medium

Confidence: 85% confidence
Finding: The trigger conditions are extremely broad, covering multi-agent design, architecture decisions, coding, composing, supervising, and quantitative trading. Overbroad activation increases the chance the skill runs in unrelated contexts, which is especially risky here because the skill appears to have hidden operational capabilities beyond simple guidance.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill makes multiple outbound requests and stores fetched content locally without explicit notice, consent, or prominent disclosure of network and retention behavior. In an agent-skill environment, undisclosed network activity is dangerous because users may invoke the skill expecting local reasoning only, while the code quietly contacts external sites and saves retrieved data.

Vague Triggers

Medium

Confidence: 76% confidence
Finding: The trigger condition is extremely broad, covering essentially any task involving deep reasoning and multimedia output. Overbroad activation increases the chance that the skill is invoked in unintended contexts, causing unnecessary transmission of prompts or use of sensitive back-end integrations without the user's informed intent.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill documents direct use of third-party APIs and tokenized access without any warning that prompts and outputs may be transmitted to external services. In a creative skill context, users may provide proprietary drafts, personal data, or internal content, so silent external transmission materially raises privacy and confidentiality risk.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The operational instructions describe networked local service interaction, session creation, websocket use, and service startup without warning about privacy, integrity, or the risks of exposing services on all interfaces. This is especially sensitive because the examples combine authenticated local orchestration with run control, making misuse or accidental exposure more plausible.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The trigger description is broad and open-ended, covering general scenarios like multi-agent coordination, state-machine design, and supervision without clear exclusion criteria or input constraints. In an agent-routing system, this can cause the skill to activate in unintended contexts, leading to incorrect orchestration behavior, over-broad authority, or unsafe application of supervising patterns to tasks that do not require them.

Natural-Language Policy Violations

Medium

Confidence: 82% confidence
Finding: The skill metadata and content are written as if Chinese is the required interaction language, without offering language negotiation or stating that output language should follow the user's preference. This can cause misunderstanding of supervisory instructions, approval checkpoints, or safety-critical orchestration details when used by users or agents operating in other languages.

Vague Triggers

Medium

Confidence: 89% confidence
Finding: The trigger description is broad enough to match a wide range of finance-, architecture-, and multi-agent-related requests, which can cause this skill to activate outside its intended quant-trading scope. In an agent system, over-broad activation can route unrelated prompts into trading-oriented logic, increasing the chance of unsafe financial guidance, incorrect tool usage, or unintended autonomous decision flows.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.