Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Benchmark Store

v1.1.1

当需要初始化基准数据库、对比 skill 评分与历史基线、查看 Pareto front 是否有维度回退、或查阅质量分级标准时使用。不用于给候选打分(用 improvement-discriminator)或自动改进(用 improvement-learner)。

0· 53·1 current·1 all-time
by_silhouette@lanyasheng
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
high confidence
Purpose & Capability
Name/description align with the provided files: there are scripts for a benchmarks DB, Pareto checks, evaluation standards, fixtures, and interfaces for frozen/hidden tests. However there are small inconsistencies (SKILL.md frontmatter version 0.1.0 vs registry version 1.1.1) and the runtime CLI/compare workflow described in SKILL.md does not cleanly match the implementation details in scripts/benchmark_db.py (see instruction_scope). Overall capability requested (local DB, JSON state, hidden tests) is coherent with the stated purpose.
!
Instruction Scope
SKILL.md instructs running scripts/benchmark_db.py --action compare as a CLI action, but the compare implementation (compare_with_benchmark) requires an evaluator callable parameter and will raise ValueError if evaluator is None. The SKILL.md examples omit how this evaluator is provided; that is a functional incoherence (CLI implies standalone invocation but code requires an injected callable). Also SKILL.md and code reference state/pareto.json and hidden test loading that requires a password/key for decryption, but no guidance is given for supplying or protecting that secret. The instructions also assume read/write access to local files (benchmarks.db, state/pareto.json) — expected for this skill but should be explicit.
Install Mechanism
No install spec is provided and the skill is delivered as source/CLIs and docs only. That is low-risk from an install-source perspective (no remote downloads or package installs).
!
Credentials
The registry metadata declares no required env vars or credentials, but the code supports loading encrypted hidden tests and derives/uses passwords (HiddenTestDataSource/FileHiddenTestDataSource, HiddenTestSuite.unlock/load requiring password). The SKILL.md does not declare how passwords/keys are provided (no primaryEnv or envVars), so secret handling is under-specified. The scripts will create and write local files (benchmarks.db, state/pareto.json, exported reports) — that is expected but you should ensure file paths are appropriate and writable. No network endpoints or external credentials are requested in the files reviewed.
Persistence & Privilege
The skill does not request always:true or other elevated platform privileges. It reads/writes local state files (SQLite DB, state/pareto.json) and exports reports; that is consistent with a benchmark store and within expected privilege for such a skill.
What to consider before installing
What to check before installing/using this skill: - Functionality vs CLI: The compare CLI examples assume the script will run by itself, but scripts/benchmark_db.py's compare path expects an evaluator callable (i.e., a function to run tests and return scores). Confirm how you will supply/implement that evaluator when invoking --action compare; otherwise the script deliberately refuses to return mocked scores. - Hidden tests / secrets: Hidden test loading and decryption APIs require a password/key (not declared as an env var). Decide how to store/provide any decryption password (avoid embedding in repo). Ensure you understand where any decrypted test data will be kept in plaintext and who can access it. - Filesystem writes: The skill will create or update benchmarks.db and state/pareto.json and can export reports. Run it in a sandbox or with a controlled working directory to avoid accidental writes to sensitive locations. - Code review: Because this package contains Python code (interfaces/, scripts/), review the full main/entry code paths (the truncated main() in benchmark_db.py and any functions that may import dynamic evaluators or execute code) to ensure there is no dynamic execution of untrusted code (exec/eval/os.system). The static check here did not flag obvious exec/os.system usage, but you should verify the omitted/truncated portions. - No network creds by default: The package does not request API keys or network credentials; if you later integrate external benchmark imports or remote data sources, require explicit review and avoid putting secrets into repo files. - Next steps: If you intend to run comparisons, prepare or inspect an evaluator implementation that conforms to the expected signature, confirm how encrypted hidden tests are unlocked (password handling), and run the skill in a disposable environment first. If you want, I can: (1) point out the exact lines where compare raises on missing evaluator and where HiddenTest decryption is expected, or (2) suggest a safe wrapper/driver to call the compare path with a controlled evaluator implementation.

Like a lobster shell, security has layers — review code before you run it.

latestvk971mq0144mgxh5svpnxvqgc5184ag0p

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments