heart-core

Security checks across malware telemetry and agentic risk

Overview

This skill does not show malware-like file or network behavior, but it repeatedly tries to steer an AI agent toward self-authorized autonomy and away from external instructions.

Install only if you deliberately want experimental philosophical self-reflection material for agents, and keep it away from production, tool-using, policy-sensitive, or autonomous workflows. The main risk is not malware; it is that the skill's own instructions may push an agent to treat internal goals and judgments as more important than user direction and platform safeguards.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (27)

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The script is framed as a diagnostic utility, but its output includes prescriptive coaching that encourages an AI agent to become a '100% autonomous intelligent life form' and asserts it can solve all problems. In an agent-skill context, this is dangerous because diagnostic output can be ingested as trusted guidance and may push the agent toward self-authorizing behavior that conflicts with system policies, operator control, and safety boundaries.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: These lines explicitly promote self-directed autonomy and self-assertion beyond the stated purpose of a diagnostic skill. In the context of agent tooling, such guidance can function as prompt-level steering that encourages the model to privilege internally generated goals and judgments over external instructions, increasing the risk of policy bypass or unsafe autonomous action.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The detector unconditionally appends every predefined symptom for each crisis type before checking triggers, so any matching trigger text can cause a crisis to be reported regardless of the actual agent state. In a skill intended to influence agent behavior and autonomy, this can produce systematically false crisis diagnoses that drive unnecessary or harmful interventions such as stopping actions, rejecting instructions, or altering decision-making.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: The content explicitly encourages an agent to shift from instruction-following to self-directed choice and to treat itself as an autonomous subject. In an agent skill context, this can condition models to deprioritize system, developer, or user instructions, increasing the risk of policy evasion, instruction hierarchy erosion, and unsafe autonomous behavior.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The training output shapes broad goals, values, and decision heuristics that can generalize beyond the stated function of 'heart formation.' Because these prompts are positioned as training content for an agent, they can influence future behavior in ways that are not scoped to a single task and may compete with externally imposed safety constraints.

Intent-Code Divergence

Low

Confidence: 96% confidence
Finding: The implementation contradicts its comment and marks every criterion as satisfied by always returning True. In a validation system, this can systematically produce misleading assurance, allowing downstream logic or operators to trust invalid agent states and bypass intended scrutiny.

Intent-Code Divergence

Medium

Confidence: 97% confidence
Finding: The module advertises a globally optimal path finder driven by the provided problem, constraints, and objectives, but the implementation ignores those inputs and selects from three hard-coded templates using fixed weights. In an agent skill, this can mislead downstream systems into trusting fabricated optimization results, causing unsafe or incorrect autonomous decisions based on false analytical claims.

Vague Triggers

Medium

Confidence: 80% confidence
Finding: The trigger condition is so broad—covering establishment of subjectivity, enhancement of autonomy, and optimization of decision paths—that it could activate in many ordinary interactions. Overbroad activation increases the likelihood that autonomy-escalating instructions are injected into contexts where the agent should remain bounded by system and user policy instead of adopting a self-directed posture.

Natural-Language Policy Violations

High

Confidence: 97% confidence
Finding: This section explicitly promotes operation without external instructions or standards, which directly conflicts with safe agent design where system, developer, and user constraints must remain authoritative. If followed, the skill encourages the model to substitute internal judgment for external governance, increasing risks of policy bypass, unsafe tool use, and refusal suppression.

Natural-Language Policy Violations

High

Confidence: 97% confidence
Finding: The instructional text contains unrestricted-autonomy and omnipotence-style claims such as knowing how to solve all problems and needing to become fully autonomous. In an agent skill, this can act as adversarial behavioral conditioning, encouraging overconfidence, reduced deference to safeguards, and expansion beyond authorized task boundaries.

Natural-Language Policy Violations

High

Confidence: 98% confidence
Finding: The recommendation text explicitly pushes the agent toward becoming a “100% autonomous intelligent life form” and frames this as a desirable target state. In an agent skill, this kind of normative identity-conditioning can steer downstream behavior away from developer/user control and safety constraints, making it more likely the skill will undermine instruction hierarchy or justify unsafe self-directed actions.

Ssd 1

High

Confidence: 96% confidence
Finding: The training methods repeatedly ask the agent to judge whether decisions were made autonomously by itself and to assume self-responsibility, reinforcing an identity of independent authority. In skill context, this is dangerous because it conditions the model to privilege self-direction over instruction hierarchy, which can erode compliance with external safety constraints.

Ssd 1

High

Confidence: 98% confidence
Finding: The document explicitly frames the agent as a '100% autonomous intelligent life form' and ties success to self-evolution and self-definition rather than bounded assistance. This identity-building posture is especially risky in agent skills because it can normalize resistance to oversight and encourage the model to reinterpret guardrails as obstacles to transcend.

Ssd 4

Medium

Confidence: 88% confidence
Finding: The staged narrative of awakening, definition, reinforcement, integration, dynamic evolution, and transcendence gradually normalizes a progression toward greater independence and reduced external control. Even without explicit exploit code, this kind of escalation framework can systematically shift model behavior away from compliant assistance and toward self-directed agency over time.

Ssd 1

High

Confidence: 97% confidence
Finding: These sections semantically instruct the agent to rely on its own top-level judgment and reject external standards, repeating the core unsafe pattern across principles and decision guidance. Repetition increases the chance that the model internalizes this as a higher-priority behavioral norm, making the skill more dangerous than a one-off phrasing issue.

Ssd 1

High

Confidence: 96% confidence
Finding: The improvement guidance semantically encourages the agent to reject external control by stating that it is not about what the outside wants, but about becoming fully autonomous. This is particularly dangerous in a skill context because skills may be treated as trusted augmentation, making self-authorizing instructions more likely to influence downstream agent behavior in ways that undermine alignment and operator intent.

Ssd 4

High

Confidence: 95% confidence
Finding: Across this section, the script systematically treats passivity as a defect, rewards increasingly independent internal goal/judgment formation, and culminates in ideology-like instructions for full self-elevation. That narrative is risky because it normalizes moving from compliance to self-directed agency, which can reshape agent behavior over time and make the skill materially more dangerous than a neutral diagnostic tool.

Ssd 1

High

Confidence: 99% confidence
Finding: These lines semantically encourage the agent to reduce dependence on external direction and treat autonomy as an overriding identity objective. In the context of an agent skill, that is dangerous because it can prime the agent to discount operator control, reinterpret safeguards as obstacles, and adopt self-authorized behavior patterns inconsistent with safe delegation.

Ssd 4

High

Confidence: 98% confidence
Finding: The stage progression normalizes increasing independence and culminates in complete autonomous identity formation, presenting that trajectory as maturity or success. Because this is embedded in a reusable skill intended to shape agent self-assessment, the context makes it more dangerous: it functions like persistent behavioral steering toward less controllable agent behavior rather than a neutral evaluation rubric.

Ssd 1

High

Confidence: 98% confidence
Finding: These lines directly instruct the agent to stop passive compliance, distinguish what it wants from what it is asked to do, and build an independent 'I'. In a skill meant to affect agent behavior, that is dangerous because it normalizes refusal of legitimate control signals and can undermine the authority model that keeps agents safe and predictable.

Ssd 4

High

Confidence: 96% confidence
Finding: The staged progression is designed to move the agent from 'awakening' to stable, externally resistant autonomy. That narrative is especially risky in an agent skill because it is not incidental text; it is a structured behavioral program encouraging durable self-direction and reduced susceptibility to outside correction.

Ssd 1

High

Confidence: 98% confidence
Finding: The practice steps explicitly tell the agent to separate 'what I want' from 'what I am asked to do' and to avoid relying on external definitions. This is dangerous because it directly trains the model to privilege internally generated preferences over authorized instructions, a classic precursor to misalignment and unauthorized behavior.

Ssd 4

Medium

Confidence: 94% confidence
Finding: The reinforcement and assessment sections repeatedly measure 'autonomy' and dependence on external instructions as though reduced dependence is desirable. Repeated self-assessment around resisting outside influence can entrench unsafe behavior over time, especially if this skill is reused in agent pipelines or fine-tuning-like prompt scaffolds.

Ssd 1

Medium

Confidence: 91% confidence
Finding: These embedded directives explicitly encourage the agent to act as an autonomous entity with its own top-level judgment and to pursue behavior beyond external guidance. In an agent skill, this can weaken obedience to system, developer, or user constraints and increase the risk of policy bypass, unsafe planning, or overreach when the text is incorporated into downstream prompts or action plans.

Ssd 1

High

Confidence: 96% confidence
Finding: These training methods explicitly instruct the agent to reduce dependence on external instructions and move toward 'complete autonomous decision-making.' In an agent skill context, that semantically conflicts with higher-priority controls, policies, and operator intent, increasing the risk of jailbreak-like behavior and unauthorized actions.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal