Backtest Poller

Security checks across malware telemetry and agentic risk

Overview

The skill mostly does what it says, but its background daemon includes an unsafe notification command path that could run local shell commands from crafted backtest text.

Review before installing with real QuantConnect credentials. Use a limited token if possible, choose drawdown thresholds carefully because early-stop permanently deletes backtests, avoid untrusted or shell-like characters in backtest names until the notification code is fixed, and disable auto-diagnosis unless you trust the imported forensics module.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Findings (8)

os.system() or os exec-family call

High
Category
Dangerous Code Execution
Content
try:
            title = "Backtest Poller"
            message = f"Backtest {bt.name} finished ({bt.status})"
            os.system(
                f"""osascript -e 'display notification "{message}" with title "{title}"'"""
            )
            logger.info(f"Notification sent: {message}")
Confidence
98% confidence
Finding
os.system( f"""osascript -e 'display notification "{message}" with title "{title}"'""" )

Dynamic import via __import__()

Medium
Category
Dangerous Code Execution
Content
"forensics",
            ]:
                try:
                    mod = __import__(module_path, fromlist=["OrderForensics"])
                    OrderForensics = getattr(mod, "OrderForensics", None)
                    if OrderForensics:
                        break
Confidence
81% confidence
Finding
mod = __import__(module_path, fromlist=["OrderForensics"])

Lp3

Medium
Category
MCP Least Privilege
Confidence
91% confidence
Finding
The skill declares required environment variables and binaries in metadata but does not declare explicit permissions despite clearly describing capabilities that read secrets from the environment, write persistent state/results, access the network, and launch shell/background processes. This is dangerous because users and any permission-gating system may underestimate the effective trust boundary of the skill, especially since it runs as a nohup daemon and can continue operating after the terminal disconnects.

Tp4

High
Category
MCP Tool Poisoning
Confidence
94% confidence
Finding
The documented behavior goes beyond passive monitoring: it can submit jobs, delete backtests irreversibly, run a background daemon, invoke macOS notifications via osascript, and dynamically execute diagnosis logic from the Python path. A description-behavior mismatch is security-relevant because operators may approve a 'monitor' skill without realizing it has destructive API actions and code-execution/extensibility surfaces, increasing the chance of unintended data loss or abuse.

Context-Inappropriate Capability

Medium
Confidence
91% confidence
Finding
Executing host shell commands is broader capability than needed and increases attack surface, especially when combined with interpolated backtest data. In this specific file, the shell use is not merely unnecessary design-wise; it compounds into a command-injection path in a background monitoring daemon.

Description-Behavior Mismatch

Medium
Confidence
90% confidence
Finding
The client exposes a destructive `delete_backtest` capability even though the skill is framed primarily as a monitoring/polling daemon. In an agent setting, adding unnecessary mutation/abort functionality increases the blast radius: if the skill is misused, prompted incorrectly, or compromised, it can terminate user backtests rather than merely observe them.

Unpinned Dependencies

Low
Category
Supply Chain
Content
requests>=2.28.0
python-dotenv>=1.0.0
Confidence
95% confidence
Finding
requests>=2.28.0

Unpinned Dependencies

Low
Category
Supply Chain
Content
requests>=2.28.0
python-dotenv>=1.0.0
Confidence
94% confidence
Finding
python-dotenv>=1.0.0

VirusTotal

62/62 vendors flagged this skill as clean.

View on VirusTotal