Smartness Eval Open Source
v0.3.3OpenClaw 智能度综合评伌技能。围绕 14 个维度(含规划能力、幻觉控制)输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。
⭐ 0· 108·0 current·0 all-time
by圆规@yh22e
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description match what the package actually contains: a local evaluation framework that reads runtime state files and runs capability tests. The listed inputs (state/*.json, .reasoning/*.sqlite, logs) are reasonable data sources for an evaluation tool; the declared optional API keys are only for an opt-in LLM-judge feature. Dependencies on other OpenClaw core scripts (cognitive-kernel-v6.py, api-fallback-v5.py, etc.) are expected for in-workspace testing.
Instruction Scope
SKILL.md explicitly states the tool is read-only for specific state files and writes only under state/smartness-eval/. It also documents a validate_command() gate and that network access is off by default. This is coherent, however the test suite executes external workspace scripts (e.g., scripts/cognitive-kernel-v6.py) which are not bundled here — those scripts determine the ultimate runtime behavior (they could read additional files or call network endpoints). To be fully confident you should review eval.py (validate_command implementation) and the external test scripts that will be executed.
Install Mechanism
No install spec and no external downloads; the skill is instruction/code-only and runs from the workspace. That minimizes supply-chain risk from arbitrary downloads. The repo includes its own Python scripts and JSON configs; nothing in the manifest pulls code from external URLs.
Credentials
No required env vars, and the only optional credentials are DEEPSEEK_API_KEY or OPENAI_API_KEY for the explicitly opt-in --llm-judge feature. That is proportionate to the documented behavior. The skill does not request unrelated cloud credentials or broad tokens in its metadata.
Persistence & Privilege
always is false; the skill claims to only write outputs to state/smartness-eval/ and not modify config or other skills. This matches the manifest and docs. It does execute subprocesses (expected for tests) but does not request permanent installation or elevated platform privileges.
Assessment
What to check before installing or running:
- Review scripts/eval.py and scripts/state_probe.py (the run-time engine) to confirm validate_command() actually blocks inline execution, absolute paths, path traversal, shell=True, and network unless --llm-judge is explicitly used.
- Understand that the test commands call other workspace scripts (e.g., scripts/cognitive-kernel-v6.py, scripts/api-fallback-v5.py) which are NOT bundled here; inspect those external scripts (in your OpenClaw workspace) to ensure they don't read or send data beyond what you expect.
- The skill will read many state files and a reasoning SQLite DB. Ensure those files do not contain secrets or sensitive user data you don't want evaluated tools to access.
- The LLM-judge feature is opt-in and requires setting DEEPSEEK_API_KEY or OPENAI_API_KEY; do not set those env vars unless you accept sending the aggregated summary described in docs.
- If you need stronger assurance, provide the full eval.py source (runtime portion that builds/validates/executes commands) for review; that would raise confidence to high.Like a lobster shell, security has layers — review code before you run it.
latestvk9725bgaxhqq13t4eewdgt36gx83cfmn
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
