Principles Agent

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed planning and LLM-orchestration skill with notable reliability and data-handling caveats, but no hidden system, credential, network, persistence, or destructive behavior was found.

Install only if you want a planning framework that routes goals and intermediate task outputs through OpenClaw's configured LLM callable. Avoid using it with secrets, regulated data, or private customer data unless that LLM environment is approved, and expect possible failures or incorrect validation until the parsing bugs are fixed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration

Findings (7)

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: `derive_truths` builds a prompt from user-controlled goal fields and then immediately passes that prompt into `_parse_response`, which calls `_extract_json` and `json.loads` as if the text were a trusted LLM reply. This creates a prompt/response confusion bug: attacker-controlled braces or JSON-like content embedded in the goal can be parsed as authoritative output, causing denial of service via parse errors or incorrect derived truths that downstream components may trust.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: `re_derive_truths` has the same prompt/response confusion flaw, but now also incorporates attacker-influenced `feedback` into the text that is parsed as JSON. An adversary can inject structured content or malformed braces in feedback to crash processing or steer the returned `FundamentalTruth` objects and relationship text without any LLM call occurring.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: `validate_task` claims to validate task output, but it never invokes an external validator and instead parses the prompt it just constructed as if it were a model response. This causes validation to fail open/incorrectly at runtime or produce unusable results, meaning downstream orchestration may rely on broken assurance checks and accept or reject task outputs based on invalid logic.

Intent-Code Divergence

Medium

Confidence: 99% confidence
Finding: The comment says the real LLM call occurs at a higher layer, but the code immediately parses the locally generated prompt, creating a logic flaw that defeats the intended validation control. In a system that uses this validator as a trust gate, this undermines integrity of task approval and can allow unsafe or noncompliant outputs to move through the pipeline unchecked if surrounding code assumes validation occurred.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The trigger conditions are very broad and describe common activities such as designing complex systems, deep analysis, and multi-step collaboration. In an agent ecosystem, this can cause the skill to activate for many ordinary requests, potentially overriding more appropriate tools, increasing prompt surface area, and causing unintended task routing or excessive autonomy.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The executor copies prior subtask outputs directly into the next LLM prompt, so any sensitive or adversarial content from an earlier step is propagated downstream without filtering, minimization, or disclosure. This creates prompt-injection and data-leakage risk because later model calls may obey instructions embedded in prior outputs or unnecessarily reproduce confidential content.

Ssd 3

Medium

Confidence: 96% confidence
Finding: Prior task outputs are injected into subsequent prompts as raw text, including up to 2000 characters of each dependency output. In a multi-step agent pipeline, this can leak sensitive user content across task boundaries and amplify prompt injection, because malicious text in an earlier result can influence later model behavior when embedded alongside trusted instructions.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal