Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Improvement Discriminator

v1.0.0

当需要对改进候选多人盲审打分、用 LLM 做语义评估、判断候选是否应被接受、或打分结果全是 hold 想知道为什么时使用。支持 --panel 多审阅者盲审和 --llm-judge 语义评估。不用于结构评估(用 improvement-learner)或门禁决策(用 improvement-gate)。

0· 11·0 current·0 all-time
by_silhouette@lanyasheng
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The skill intends to score and rank improvement candidates and to optionally use an LLM judge — that aligns with the included code (critic engine, human review, llm_judge). However the registry metadata lists no required environment variables while SKILL.md and llm_judge.py explicitly document/use ANTHROPIC_API_KEY / OPENAI_API_KEY (and optional base URLs). This mismatch is noteworthy: the skill can operate in mock mode without keys, but using real LLM backends requires credentials that were not declared in the registry metadata.
Instruction Scope
SKILL.md instructs running scripts/score.py with --input and optional flags (panel, --llm-judge, --use-evaluator-evidence). That matches the repo's scripts. The instructions do not overtly request arbitrary system secrets, but the implementation will read/write state (default state/ path) and can save human review receipts to disk. More importantly, the Critic/RealSkillEvaluator supports loading Python skill modules from file paths and invoking evaluate()/execute() functions — this can execute arbitrary code supplied as a candidate or present on disk, which expands runtime scope beyond simple scoring.
Install Mechanism
No install spec is provided (instruction-only install), and no external downloads are performed by the package itself. The code may import third-party SDKs (anthropic, openai) at runtime if the user enables those backends, but there is no automatic installer or external URL fetch in the provided metadata.
!
Credentials
Registry metadata declares no required env vars, but SKILL.md and interfaces/llm_judge.py document/use ANTHROPIC_API_KEY and OPENAI_API_KEY (and support ANTHROPIC_BASE_URL). The skill will attempt to call networked LLM backends when --llm-judge is used, which requires those keys. This is a mismatch between declared requirements and actual code. Additionally, the code inserts a sibling 'benchmark-store' path and attempts to import from it — that implies access to other local skill code/config, which increases the scope of data accessible at runtime.
Persistence & Privilege
always:false and autonomous invocation are default/normal. The skill writes state and review receipts (HumanReviewReceipt.save writes files under state paths) and the critic engine may load external benchmark or skill modules from disk. It does not explicitly modify other skills' configurations, but loading/executing other Python modules grants it runtime privilege equivalent to executing arbitrary code if untrusted modules or paths are supplied.
What to consider before installing
Key points before installing/using: - The SKILL.md mentions LLM backends that require ANTHROPIC_API_KEY or OPENAI_API_KEY but the registry metadata lists no required env vars — treat API keys as optional only if you use the mock backend. If you enable --llm-judge with a real backend, provide credentials only if you trust the skill. - The evaluator can load and call arbitrary Python modules (RealSkillEvaluator / importlib). Do not point it at untrusted directories or candidate artifacts that contain executable code unless you sandbox the run (e.g., isolated container, restricted runtime). - The skill reads and writes local state (default state/). Review scripts/score.py and the RealSkillEvaluator implementation to confirm exactly what files are read/written and whether any network calls beyond LLM providers occur (some files were truncated in the listing). - If you want to avoid external API calls, run with --llm-judge mock and/or inspect/disable the LLMJudge backend code. - Recommended actions: inspect scripts/score.py and the RealSkillEvaluator code paths in full, run the skill in a sandboxed environment first, and only supply API keys if needed and you understand what vectors (file loading, outbound network) will be used.

Like a lobster shell, security has layers — review code before you run it.

latestvk974mwf11wkw1db1dfgebx13wx848mn0

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments