Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

SkillBench

v2.0.0

Track skill versions, benchmark performance, compare improvements, and get self-improvement signals. Integrates with tasktime and ClawVault.

0· 1.3k·1 current·1 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name and description match the declared binary and CLI functionality (skillbench CLI). The install spec (npm @versatly/skillbench → skillbench binary) is consistent with the stated purpose of a benchmarking CLI. However the registry metadata provides no homepage or source repo to review.
!
Instruction Scope
SKILL.md instructs the agent to call the skillbench CLI and the tasktime ('tt') CLI and to sync with external services (ClawVault, ClawHub). The skill's requires.bins only lists 'skillbench' and does not declare 'tt' or any other external tool it references, and it does not declare where ClawVault/ClawHub credentials come from — so the runtime instructions rely on tools/credentials not described in the skill manifest.
Install Mechanism
Install uses npm (@versatly/skillbench) to create a global 'skillbench' binary — a common pattern for CLIs but one that executes third-party code during install/use. There is no homepage or source URL in the metadata to audit the package, increasing the risk because arbitrary npm package code would run on install and at runtime.
!
Credentials
The SKILL.md describes automatic syncing to ClawVault and ClawHub and interaction with external dashboards and CI. Yet the skill declares no required environment variables or auth tokens. This is a mismatch: the CLI likely needs credentials or config files to access those services, but the skill does not declare where those credentials come from or what variables/paths it will read.
Persistence & Privilege
The skill is not 'always' and does not request elevated platform privileges in the manifest. It installs a CLI binary (global npm install) but does not declare modifying other skills or agent-wide config; that is within normal bounds for a user-invokable CLI skill.
What to consider before installing
This skill appears to be a legitimate benchmarking CLI, but there are several red flags you should address before installing: 1) npm packages run code on install and at runtime — review the package source (GitHub repo) and the published package content before running npm install globally; 2) SKILL.md references the tasktime 'tt' CLI and external services (ClawVault/ClawHub) but the manifest doesn't declare those dependencies or any auth variables — verify how the CLI obtains credentials and ensure it won't read unexpected config files or exfiltrate data; 3) Prefer to install and test this tool in an isolated environment (container or VM) first, inspect what network endpoints it contacts, and check what files/dirs it writes; 4) If you plan to give it access to service tokens, issue scoped tokens with minimal privileges and rotate them after testing; 5) If you need help auditing the npm package contents or confirming the CLI's network behavior, provide the package URL or the package tarball and I can help review it. Proceed with caution.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binsskillbench

Install

Install SkillBench CLI (npm)
Bins: skillbench
npm i -g @versatly/skillbench
latestvk979nepnbje3v4wk69tg527ydn80wsgp
1.3kdownloads
0stars
6versions
Updated 6h ago
v2.0.0
MIT-0

skillbench Skill

Self-improving skill ecosystem for AI agents.

Track skill versions, benchmark performance, compare improvements, and get signals on what to fix next.

Part of the ClawVault ecosystem | tasktime | ClawHub

Installation

npm install -g @versatly/skillbench

The Loop

1. Use a skill    → skillbench use github@1.0.0
2. Do the task    → tt start "Create PR" && ... && tt stop
3. Record result  → skillbench record "Create PR" --success
4. Check scores   → skillbench score github
5. Improve skill  → Update skill, bump version
6. Repeat         → Compare v1.0.0 vs v1.1.0

Commands

Track Skills

skillbench use github@1.2.0            # Set active skill version
skillbench skills                       # List tracked skills + signals

Record Benchmarks

# Auto-pulls duration from tasktime
skillbench record "Create PR" --success

# Manual duration
skillbench record "Create PR" --duration 45s --success

# Record failures
skillbench record "Create PR" --fail --error-type "auth-error"

Score & Compare

skillbench score                        # All skills with grades
skillbench score github                 # Single skill
skillbench compare github@1.0.0 github@1.1.0

Export & Dashboard

skillbench export --format markdown
skillbench export --format json
skillbench dashboard                    # Generate HTML dashboard
skillbench dashboard --open             # Generate and open in browser

Automated Testing

skillbench test tasktime@1.1.0          # Run smoke test
skillbench test tasktime@1.1.0 --suite full  # Run named suite
skillbench test tasktime@1.1.0 --dry-run     # Test without recording

Sync

skillbench sync --clawhub               # Import installed skills
skillbench sync --vault                 # Sync to ClawVault
skillbench sync --all                   # Everything

Health & Monitoring

skillbench health                       # Overall health report with alerts
skillbench watch --once                 # Run all test suites once
skillbench watch --interval 300         # Continuous monitoring every 5 min

Analysis & Improvement

skillbench improve                      # Get suggestions for weakest skill
skillbench improve github               # Improvement plan for specific skill
skillbench trend tasktime --days 30     # Performance trend over time
skillbench leaderboard                  # Compare agents (multi-agent setups)
skillbench schedule --interval 60       # Generate cron config for auto-testing

Baselines & Regression Detection

skillbench baseline tasktime --set      # Set baseline from current performance
skillbench baseline --list              # List all baselines
skillbench baseline --check             # Check all baselines (CI-friendly, exits 1 if failing)
skillbench baseline tasktime --remove   # Remove a baseline

CI/CD Integration

skillbench ci                           # Run all tests + baseline checks
skillbench ci --json                    # JSON output for automation
skillbench badge                        # Generate shields.io badges for README

Copy examples/github-action.yml for ready-to-use GitHub Actions workflow.

Grading System

GradeScoreMeaning
🏆 A+95-100Elite performance
✅ A85-94Excellent
👍 B70-84Good
⚠️ C50-69Needs work
❌ D<50Broken

Based on: Success Rate (40%), Avg Duration (30%), Consistency (20%), Trend (10%)

tasktime Integration

When you omit --duration, skillbench pulls from tasktime:

tt start "Create PR" -c git
# ... do work ...
tt stop
skillbench record --success   # Duration auto-pulled

ClawVault Integration

Benchmarks sync to ClawVault automatically.

Improvement Signals

skillbench skills shows:

  • ⚠️ needs work — Success rate below 70%
  • 🕐 stale — No benchmarks in 7+ days
  • ↘️ declining — Getting worse over time

Related

Comments

Loading comments...