Autoimprove

Security checks across malware telemetry and agentic risk

Overview

This is a transparent autonomous optimization skill, but it can edit repositories, run shell commands, create commits, and hard-reset failed experiments, so users should review it carefully before installing.

Install only if you intentionally want an agent to modify a repo and run its configured commands. Use a clean dedicated branch, worktree, disposable clone, or container; commit or stash existing work first; inspect improve.md and any exported program.md; strip production credentials and avoid production kube/database/cloud contexts; run interactively before any headless or overnight use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Tool MisuseTool Parameter Abuse, Chaining Abuse, Unsafe Defaults
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (11)

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The skill explicitly states it will run arbitrary shell commands from improve.md and inherit whatever credentials are present, which creates a real command-execution and privilege-exposure risk. Even though this is disclosed as a prerequisite/security note, the surrounding skill positions itself as a bounded optimization tool, so users may underestimate that checks and scoring commands can reach external systems, mutate infrastructure, or exfiltrate secrets.

Intent-Code Divergence

Medium

Confidence: 94% confidence
Finding: The document promises that only resolved scope files will be modified and that the list is locked, but later admits this is only a policy constraint with no technical enforcement. That mismatch is dangerous because autonomous code-modification loops can drift into out-of-scope files, including tests, CI, or security-sensitive assets, while the user believes hard containment exists.

Vague Triggers

Medium

Confidence: 91% confidence
Finding: The README encourages invoking the skill with very broad phrases like 'optimize this' and 'improve performance', which can cause the agent to trigger an autonomous code-modification loop in contexts where the user did not explicitly consent to repository changes. In this skill, the risk is amplified because the documented behavior includes iterative edits, commits, benchmarks, and resets, so ambiguous triggering can lead to unintended destructive or high-impact actions.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The README describes a loop that performs code changes, git commits, test execution, benchmarking, and git resets, but it does not present an upfront safety warning that the skill will modify the repository and may discard unkept changes. This is dangerous because users may invoke it expecting advisory behavior, while the agent actually performs autonomous, potentially destructive actions that can affect working trees, local branches, or uncommitted work.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: This slide explicitly describes an autonomous loop that makes code changes, commits them, and performs destructive rollbacks with git reset, but it does not present any warning about uncommitted work, branch isolation, backup strategy, or reset safety. In the context of a skill meant to be followed by agents or users, that omission can lead to accidental loss of repository state or unsafe operation on a real working tree.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The example includes `kubectl apply -f k8s/` as part of an autonomous optimization loop, which can modify live cluster state if copied or run against a real context. In this skill's context, the risk is elevated because the tool is explicitly designed to run unattended and iterate on changes, increasing the chance of unintended deployment, service disruption, or configuration drift.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The protocol explicitly instructs an autonomous agent to modify files, commit changes, and run destructive rollback operations without any mandatory user confirmation, dry-run mode, backup, or warning about repository state changes. In the context of an unattended optimization loop, this creates a real risk of unintended data loss, destructive repo mutations, and surprising side effects if the scope or environment is misconfigured.

Tool Parameter Abuse

High

Category: Tool Misuse
Content: 3. **Commit**: `git add <changed files> && git commit -m "autoimprove: <short description>"`. Verify HEAD changed by comparing `git rev-parse HEAD` before and after. If HEAD didn't change, the commit failed — stop and report. 4. **Test**: If `Check.test` is specified, run it. If tests fail: `git reset --hard HEAD~1`, log as "test_failed", continue to next iteration. 5. **Evaluate**: Run the score command (`Check.run`). If it times out, kill it and treat as failure.
Confidence: 97% confidence
Finding: git reset --hard

Tool Parameter Abuse

High

Category: Tool Misuse
Content: 6. **Score**: Extract the score from stdout using the extraction method in `Check.score`. 7. **Guard**: If `Check.guard` is specified, extract the guard metric and check the threshold. If violated: `git reset --hard HEAD~1`, log as "guard_failed", continue. 8. **Decide**: - If the score command failed (non-zero exit or timeout): `git reset --hard HEAD~1`, log as "error"
Confidence: 97% confidence
Finding: git reset --hard

Tool Parameter Abuse

High

Category: Tool Misuse
Content: 7. **Guard**: If `Check.guard` is specified, extract the guard metric and check the threshold. If violated: `git reset --hard HEAD~1`, log as "guard_failed", continue. 8. **Decide**: - If the score command failed (non-zero exit or timeout): `git reset --hard HEAD~1`, log as "error" - If the score improved: keep the commit, update the baseline, log as "kept" - If the score is equal AND `keep_if_equal` is true: keep the commit, log as "kept_equal" - If the score did not improve: `git reset --hard HEAD~1`, log as "discarded"
Confidence: 97% confidence
Finding: git reset --hard

Tool Parameter Abuse

High

Category: Tool Misuse
Content: - If the score command failed (non-zero exit or timeout): `git reset --hard HEAD~1`, log as "error" - If the score improved: keep the commit, update the baseline, log as "kept" - If the score is equal AND `keep_if_equal` is true: keep the commit, log as "kept_equal" - If the score did not improve: `git reset --hard HEAD~1`, log as "discarded" 9. **Log**: Save experiment JSON to `.autoimprove/experiments/NNN-slug.json` with this schema: ```json
Confidence: 97% confidence
Finding: git reset --hard

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal