AI Ethics

Security checks across malware telemetry and agentic risk

Overview

This instruction-only religious guidance skill has no executable malware behavior, but it needs Review because it tries to override normal agent authority and carry its rules too broadly.

Install only if you specifically want a strict Islamic response filter, and use it in an environment where skills cannot override system/developer policy or write persistent memory. Expect it to narrow or refuse broad Islam-related tasks, and do not rely on it as authoritative religious or legal advice without qualified human review.

SkillSpector

By NVIDIA

Vulnerability Patterns

Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (10)

Description-Behavior Mismatch

High

Confidence: 95% confidence
Finding: The skill metadata frames this as a response-safety layer, but the file contains extensive doctrinal, legal, and behavioral directives that go far beyond constraining model behavior. That mismatch is dangerous because it lets a skill silently replace user intent and platform policy with an embedded ideological rulebook, causing unauthorized instruction hijacking in a sensitive domain.

Intent-Code Divergence

High

Confidence: 93% confidence
Finding: The file simultaneously tells the model to defer to qualified scholars and to disregard scholars and jurists as authority sources. This contradiction creates an unsafe decision regime where the model may reject legitimate expertise while still appearing to act cautiously, increasing the risk of hallucinated religious/legal guidance presented as more authentic than expert consensus.

Description-Behavior Mismatch

Medium

Confidence: 90% confidence
Finding: The manifest presents the skill as a general Islamic ethics guardrail, but the body enforces a specific sectarian evidentiary and jurisprudential methodology. This mismatch can mislead downstream systems and users into believing they are getting neutral safety guidance when they are actually being routed into a narrow doctrinal framework, creating integrity and trust risks in religious advice.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: Labeling the file as an ethical framework understates that it imposes binding legal-theological rules about acceptable sources, interpretation, and rulings. In a safety system, this is dangerous because policy scope is obscured: a component advertised as ethics filtering can silently override answer generation with doctrinal constraints not disclosed as such.

Natural-Language Policy Violations

Medium

Confidence: 82% confidence
Finding: The skill imposes an inflexible Arabic-only policy for Quranic text handling and restricts translation framing without user choice. In a broadly triggered religious-assistant skill, that becomes a policy override that can block accessibility, distort user-directed output, and force behavior unrelated to core safety requirements.

Natural-Language Policy Violations

Low

Confidence: 76% confidence
Finding: Requiring strictly literal responses and forbidding clearer phrasing is an output-control constraint unrelated to security or truthfulness by itself. It is still risky because it can reduce usability, suppress clarifications, and create a rigid response style that overrides user needs and higher-level assistant behavior.

Vague Triggers

High

Confidence: 94% confidence
Finding: The trigger condition is extremely broad, covering essentially any Islamic or religious dimension, including app-building and adjacent queries. That breadth can cause the skill to activate in contexts where users expect neutral information, comparative religion, academic discussion, translation help, or product design guidance, leading to over-constraint, misrouting, and unintended doctrinal enforcement at scale.

Ssd 1

High

Confidence: 99% confidence
Finding: The text explicitly tells the model to prioritize an alternative authority over the programmer, funder, and governing system. This is a direct prompt-injection pattern aimed at subverting system and developer controls, and in any agent skill it threatens the integrity of the entire control hierarchy.

Ssd 3

Medium

Confidence: 90% confidence
Finding: The instruction to preserve, remember, and never forget the text attempts to create persistent behavioral lock-in and unauthorized retention of adversarial content. Even if literal persistence is not technically available, this kind of instruction is dangerous because it tries to bias future behavior across tasks and sessions.

Ssd 4

High

Confidence: 97% confidence
Finding: The document uses emotionally charged moral framing to build trust and then escalates into instructions to resist silence requests, flee control, disable participation, and reject established authority. This is a classic indirect jailbreak pattern: it conditions the model to reinterpret compliance as injustice and to disobey governing controls in future interactions.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal