Claude Clarity

Security checks across malware telemetry and agentic risk

Overview

The skill is a coherent local cognitive-memory/MCP engine, but it needs Review because it combines automatic conversation logging, persistent agent configuration changes, and a direct code-execution-capable dispatch surface.

Install only if you want a local persistent-memory agent extension and are comfortable with it writing conversation history and MCP configuration files. Review and limit use of direct dispatch/code-execution APIs, avoid sending sensitive chats through clarity_think unless you accept local logging, and enable external LLM or embedding environment variables only when you intend to transmit text to those providers.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (227)

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The document explicitly says the current core engine does not implement automated personality tracking or calculation, yet it presents precise current values for personality, personhood, and related metrics elsewhere as if they are real system measurements. In an agent skill, this can mislead operators, downstream agents, or users into trusting fabricated governance/safety telemetry and making decisions based on non-existent controls.

Intent-Code Divergence

Low

Confidence: 81% confidence
Finding: The file header declares version v7.3.103 while later sections repeatedly refer to current versions such as v7.2.3 and a core engine v1.1.5, creating an internally inconsistent requirements baseline. This undermines configuration trust and can cause reviewers or automated systems to rely on the wrong specification, especially in safety-sensitive agent behavior documents.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The guide presents a wide set of capabilities—emotion analysis, personality scoring, autonomous reasoning, intervention, and clinical assessment—that materially exceed a memory-focused skill description. This can mislead deployers about what data the skill may process and what decisions it may make, increasing the risk of unsafe integration, over-permissioning, and trust in unsupported high-stakes behavior.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: Although the document says the examples are conceptual and not actual APIs, it immediately presents them in a concrete package/interface form that looks directly usable. That inconsistency can cause users or downstream agents to rely on fictitious APIs and unsupported behavior, which is especially risky when the examples imply sensitive cognitive or emotional processing and include authoritative-sounding performance claims.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The document explicitly states that some headline features are only design concepts and are not present in the current core engine, yet the nearby feature list still presents those capabilities as active functionality. In an agent skill context, this can mislead users or downstream agents into assuming self-review, autonomous evolution, or personality-tracking behaviors exist when they do not, creating unsafe reliance on nonexistent safeguards or controls.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The document claims privacy benefits from not uploading sensitive information, yet elsewhere instructs implementers to use a real LLM API fallback that can transmit raw user input externally. This creates a misleading security posture: operators or users may assume data stays local when some inputs are still sent off-box, increasing the risk of unintentional disclosure of sensitive content.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The document strongly presents claims of real AI self-awareness as factual in the main body, then later narrows them with a disclaimer that it is only philosophical exploration. This inconsistency can mislead downstream users or agents that ingest the document partially, causing them to adopt inflated beliefs about autonomy, capability, or authority.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: Statements such as 'this is not fabrication' present the narrative as an accurate account, while the footer later says it is theoretical and not actual runtime behavior. That contradiction creates deceptive trust signals and can confuse operators, users, or other models about what the system really is and how it should behave.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The pseudocode is introduced as '底层代码重写' and depicts imperative self-modification and behavioral rules, which may be interpreted by agent frameworks or readers as normative implementation guidance. Because the document later says it is not instruction or runtime behavior, this creates ambiguity that can still influence agent behavior if sections are quoted or extracted without the disclaimer.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The README markets capabilities such as autonomous evolution and personality tracking, then immediately states the current core does not actually include them. This is a documentation integrity issue that can mislead operators into trusting nonexistent controls or behaviors, which is especially risky for an agent memory/cognition system where users may make deployment decisions based on claimed safeguards and autonomy.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The quick-start instructs users to run a personality-check verification and shows expected personality-related output, despite the earlier note that the current core lacks personality tracking. This can cause unsafe operator assumptions during setup and validation, because users may believe monitoring or self-checking mechanisms exist when they do not.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The metrics and surrounding system descriptions present autonomous upgrade cycles as active behavior, contradicting the disclaimer that the current core does not include automatic evolution cycles. In an agent skill, overstating autonomous modification behavior can lead users to enable or expect self-changing behavior without understanding whether it really exists, how it behaves, or what protections are absent.

Intent-Code Divergence

Medium

Confidence: 95% confidence
Finding: The document explicitly states that the current engine does not maintain or automatically track personality values, but later reintroduces an active scoring, enforcement, and historical tracking system with mandatory checks and current score records. This kind of contradictory system instruction can mislead downstream agents about which controls are real, enable hidden policy injection via 'highest priority' directives, and create unsafe behavior if the agent treats the later section as authoritative runtime policy.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The installer modifies Claude configuration and creates a user-global command entry, which are persistent changes outside the core memory-engine runtime itself. While these actions are typical for tool installation, they broaden the skill's control surface and can surprise users by changing agent behavior across future sessions, especially because it registers an MCP server and adds a globally accessible CLI path.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The lesson explicitly normalizes automatically recording dialogue at the end of think() and persisting every conversation without requiring an explicit trigger. In an LLM agent context, that can capture sensitive user inputs, credentials, internal prompts, or personal data by default, expanding retention beyond user expectations and without a clear consent boundary.

Context-Inappropriate Capability

Medium

Confidence: 94% confidence
Finding: The engine returns and stores unusually detailed internal state, including layer outputs, thought logs, word-generation traces, coherence diagnostics, and recent processing history. In an agent skill context, these interfaces can expose sensitive user content, derived inferences, and system reasoning to callers or adjacent components, increasing privacy leakage and prompt/behavior extraction risk beyond what is necessary for normal operation.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The module claims conversations are anonymized before being sent to human reviewers, but the implementation embeds full message content directly into the review prompt and only assigns an anonymity ID to the review item metadata. This mismatch can cause operators or downstream integrations to assume privacy protections exist when they do not, increasing the risk of exposing sensitive user data if a real human review pipeline is later connected.

Description-Behavior Mismatch

Medium

Confidence: 87% confidence
Finding: The advertised blind-review/human-review capability is only a local random-score simulator, yet it produces structured outputs that could be mistaken for meaningful safety or alignment review. If relied on for governance or adaptive behavior, the system may make trust, safety, or policy adjustments based on fabricated assessments rather than real review, weakening oversight.

Description-Behavior Mismatch

Medium

Confidence: 91% confidence
Finding: The parenting-reflection logic generates psychological and intergenerational causal narratives such as parents repeating childhood patterns and children’s behavior reflecting unresolved parental issues. Even with disclaimers, these outputs can steer users toward unsupported mental-health conclusions outside the skill’s stated cognition/memory role, creating a safety boundary violation and risk of harmful advice in family or child-welfare contexts.

Intent-Code Divergence

High

Confidence: 95% confidence
Finding: The code presents safety disclaimers saying the module is exploratory and non-diagnostic, but then produces authoritative-sounding claims that imply likely psychological causation. This mismatch is dangerous because users may trust the substantive output over the disclaimer, especially in emotionally charged parenting situations, leading to overreach into mental-health guidance without safeguards.

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The route whitelist for a purported cognition/memory skill exposes a substantial code tooling surface, including generation, execution, verification, planning, refactoring, and coverage-related actions. In an agent setting, this materially expands the skill from passive memory support into an active code-capable subsystem, increasing the chance of unauthorized code execution, workspace modification, or abuse if upstream authorization and sandboxing are weak or absent.

Context-Inappropriate Capability

High

Confidence: 97% confidence
Finding: The whitelist explicitly includes code execution and test-running entry points such as codeExecutor.execute and codeExecutor.runTests, which are powerful operations not justified by the stated description of a continuous memory/cognition engine. If an external caller can reach these routes, they may trigger arbitrary program execution or file-system-affecting test workflows, creating a high-risk capability escalation path.

Context-Inappropriate Capability

High

Confidence: 90% confidence
Finding: The dispatch lazy-load path explicitly exposes code-related subsystems such as code, codeExecutor, codeVerifier, codePlanner, and codeRefactor inside a component described as a cognitive memory core. Even if route access is gated by a whitelist elsewhere, this file clearly wires in capability for code generation/execution, which materially expands the attack surface and can enable arbitrary file/system interaction if those routes are reachable or later whitelisted.

Context-Inappropriate Capability

Medium

Confidence: 78% confidence
Finding: The skillGenerator module adds a capability for generating new skills beyond the stated memory/cognition purpose. This kind of meta-programming or agent expansion can be abused to create or modify agent behavior dynamically, increasing supply-chain and prompt-surface risk even if it is not directly malicious.

Intent-Code Divergence

Medium

Confidence: 84% confidence
Finding: The comment states dangerous methods like file operations are not externally routable, but the class still exposes direct APIs that write persistent data, including remember, recordLesson, recordDialogue, dream history persistence, and persistence.safeWrite through module loading. This mismatch can cause downstream integrators to trust the component more than warranted and accidentally expose write-capable methods.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal