Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Agent Eval Suite

Provides benchmark testing, A/B testing, performance regression detection, and simulation environment testing for agent evaluation.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 11 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description match the provided code: Benchmark, ABTester, RegressionDetector, Simulator, and ReportGenerator are present and implement the advertised testing and analysis features.
!
Instruction Scope
SKILL.md usage examples are limited and appropriate, but the Simulator.loadScenario implementation reads JSON from the filesystem using fs.readFileSync and accepts absolute paths (path.isAbsolute(scenarioName) ? scenarioName : ...). That lets the skill read arbitrary files if given an absolute path, which is not documented in SKILL.md and goes beyond the advertised sandbox/simulation behavior. The Simulator also offers fault injection that can allocate memory (Buffer.alloc(10MB)) and long sleeps, which could be used to consume host resources.
Install Mechanism
This is an instruction-only skill with no declared install spec in the registry. SKILL.md suggests running npm install locally; package.json is included and there are no remote download URLs or install hooks—no high-risk install mechanism detected.
Credentials
The skill does not request environment variables, credentials, or config paths. No code reads environment variables or external secrets. Requested access is proportional to the described functionality.
Persistence & Privilege
Skill is not always-enabled and does not request persistent privileges or modify other skills or agent-wide configs. No autonomous invocation flag escalation beyond platform defaults.
What to consider before installing
This package appears to implement the evaluation features it claims, but review before use: 1) The Simulator.loadScenario will read files from disk and accepts absolute paths — do not pass untrusted scenario names (or run it as a privileged user) because it can read arbitrary JSON files from the host. 2) Chaos/fault injection can allocate memory and sleep for long periods; running large simulations or untrusted scenarios could exhaust resources or cause timeouts. 3) Because there is no remote network activity in the code, risk is local-file and resource exposure; run the tool in a sandbox/container, inspect the simulator's scenario loading and fault-injection code, and avoid supplying scenario names or files from untrusted sources. If you want higher assurance, ask the author to sanitize scenario path handling (disallow absolute paths or restrict to a safe fixtures directory) and to make fault-injection limits configurable and documented.

Like a lobster shell, security has layers — review code before you run it.

Current versionv0.1.0
Download zip
latestvk979p9zn5fs1et8ytp2qf657ad83yk4e

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Agent Eval Suite

Agent 评估套件,提供基准测试、A/B测试、性能回归检测和模拟环境测试。

功能特性

1. 基准测试框架

  • 标准化测试用例
  • 多维度评估指标
  • 基准对比
  • 评分系统

2. A/B 测试

  • 对照组设计
  • 随机分组
  • 统计显著性检验
  • 结果分析

3. 性能回归检测

  • 历史性能对比
  • 回归告警
  • 性能趋势图
  • 根因分析

4. 模拟环境测试

  • 沙箱环境
  • 场景模拟
  • 边界条件测试
  • 故障注入

安装

npm install

使用方法

Benchmark

const { Benchmark } = require('./src');

const benchmark = new Benchmark({ iterations: 100 });

benchmark.addTest('task-completion', {
  execute: async () => await agent.completeTask(task)
});

const results = await benchmark.run();
console.log(results);

ABTester

const { ABTester } = require('./src');

const ab = new ABTester({ confidenceLevel: 0.95 });

ab.createExperiment('new-prompt', {
  control: async () => await oldPrompt(),
  treatment: async () => await newPrompt()
});

const result = await ab.run('new-prompt', { sampleSize: 200 });
console.log(result); // { winner: 'treatment', confidence: 0.97 }

RegressionDetector

const { RegressionDetector } = require('./src');

const detector = new RegressionDetector({ threshold: 0.1 });

detector.record('response-time', { version: 'v1.1.0', value: 1200 });

const regressions = detector.detect();
console.log(regressions);

测试

npm test

License

MIT

Files

9 total
Select a file
Select a file to preview.

Comments

Loading comments…