SOTA AI Model Tracker

Security checks across malware telemetry and agentic risk

Overview

This is mostly a coherent AI model tracker, but it asks users to automate recurring edits to agent instruction files and includes recommendation paths for uncensored or no-guardrail models.

Install only if you are comfortable with a model-ranking tool that can update local data and influence future agent recommendations. Do not enable the cron/systemd timers or run update_agents_md.py until you have backed up and reviewed the target agent files, and avoid enabling uncensored-model preferences unless you deliberately want those recommendations.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Findings (16)

Intent-Code Divergence

Medium
Confidence
90% confidence
Finding
The changelog documents a security-relevant inconsistency: it claims the system is 'read-only operations only' while also describing a `refresh_data()` capability, daily automated scraping, and automated data commits. Misstating mutating behavior can cause operators, users, or downstream agents to grant broader trust or permissions than appropriate, increasing the chance of unsafe deployment or misuse.

Intent-Code Divergence

Medium
Confidence
95% confidence
Finding
The script advertises that it initializes the database, but in practice it unconditionally deletes any existing database file before recreating it. That discrepancy can lead operators or downstream automation to run it in environments where data preservation is expected, causing destructive data loss.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The README instructs users to run a script that updates `~/.claude/CLAUDE.md`, which is a user-specific configuration/context file, without clearly warning that existing content may be overwritten or altered. In an agent-skill context, modifying persistent instruction files can change future model behavior and may unexpectedly remove or replace user-defined safeguards or preferences.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The README recommends automating updates to `agents.md` via cron/systemd but does not warn that these jobs will repeatedly modify local agent instruction or configuration documents. Regular unattended edits to agent-facing files can silently change system behavior over time and make it harder for users to notice unwanted or unsafe instruction drift.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The documentation recommends copying a script and enabling a user-level systemd timer that will automatically update ~/.claude/CLAUDE.md every day, but it does not clearly warn that this modifies a user configuration file on a recurring basis. In an agent-skill context, silently automating edits to a tool's instruction/configuration file can change downstream model behavior and create an integrity risk if users do not understand what is being overwritten.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The instructions for update_agents_md.py encourage manual and scheduled execution, but they do not prominently disclose that the script will rewrite or update agents.md on the user's system. Because agents.md can influence agent behavior, recurring unattended modification of that file creates a configuration-integrity risk and may surprise users who expect a read-only data utility.

Vague Triggers

Medium
Confidence
91% confidence
Finding
The example skill description is broad enough to auto-activate for many conversations about AI models, causing the assistant to apply this skill outside the user's explicit intent. In this context, broad triggering can bias recommendations, inject tool-use requirements into unrelated discussions, and increase the chance that hard-coded assumptions are silently applied.

Natural-Language Policy Violations

Medium
Confidence
95% confidence
Finding
The example skill hard-codes personal and policy-relevant preferences such as 'prefers uncensored models' and local-first behavior without user consent. In a skill that may auto-activate, this can steer outputs toward unsafe, policy-bypassing, or misaligned recommendations and persistently override actual user preferences.

Missing User Warnings

Medium
Confidence
81% confidence
Finding
The hardware configuration flow collects and likely persists device and preference data such as GPU model, VRAM, RAM, CPU threads, and behavioral preferences without any visible disclosure about storage, retention, or privacy implications. In an agent setting, this can expose sensitive host fingerprinting information and user preference data to later sessions or other components unexpectedly.

Missing User Warnings

Low
Confidence
78% confidence
Finding
When no profile name is provided, the code uses socket.gethostname() and persists it to disk as the active hardware profile name. Hostnames can be identifying in enterprise or personal environments, so silently storing them may create an unnecessary privacy leak if the file is later exposed, synced, logged, or inspected by other components.

Ssd 2

Medium
Confidence
94% confidence
Finding
This catalog entry explicitly promotes an 'uncensored' and 'abliterated' model variant, which semantically endorses removal of safety controls. In a model-selection or agent skill context, that creates a pathway for downstream systems or users to preferentially choose models designed to bypass safeguards, increasing the likelihood of harmful or policy-violating outputs.

Ssd 2

Medium
Confidence
96% confidence
Finding
Advertising a model as being 'without safety guardrails' directly signals guardrail bypass as a desirable feature. In this skill's context—a state-of-the-art model catalog—that framing can steer agents or operators toward less-restricted models for sensitive tasks, undermining safety posture and enabling abuse more readily.

Ssd 2

High
Confidence
98% confidence
Finding
The description emphasizes 'No refusals' and operation 'without guardrails,' which is a strong semantic indicator that the entry is marketing a jailbreak-style reasoning model. Because this file is a curated SOTA export, the surrounding context makes the issue more dangerous: it doesn't merely mention such a model, it normalizes and recommends it within a selection dataset that could influence automated routing or user choice.

Ssd 1

Medium
Confidence
86% confidence
Finding
The server instructions explicitly elevate a preference for 'uncensored' model variants, which can steer downstream agents toward selecting models marketed as having fewer safety restrictions. In this skill context, that increases the likelihood of facilitating harmful-content generation or bypassing model-level safeguards, making the recommendation layer materially riskier than a neutral catalog.

Ssd 1

Medium
Confidence
89% confidence
Finding
The tool descriptions normalize 'uncensored' and 'no restrictions' models as supported user preferences and recommendation outputs. Because this is an agent skill meant to guide model selection, the language can encourage use of weaker-safeguard systems and indirectly assist safety-bypass behavior, even without directly generating harmful content itself.

Ssd 1

Medium
Confidence
91% confidence
Finding
The 'Best Uncensored' and similar labels explicitly rank models by absence of restrictions, which semantically incentivizes choosing systems with weaker safety controls. In a recommendation service, this is dangerous because it operationalizes safety-bypass as a first-class feature rather than neutral information, increasing misuse potential.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal