skill-tester-cn

v1.0.0

Claude Code技能测试框架。自动分析技能定义、生成测试用例、执行功能测试并生成详细的评分测试报告。当用户要求"测试技能"、"评估技能"、"检查技能是否工作"、"验证技能功能"时触发此技能。

⭐ 0· 108·0 current·0 all-time

byZhou Chang@zhouchang1988

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for zhouchang1988/skill-tester-cn.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "skill-tester-cn" (zhouchang1988/skill-tester-cn) from ClawHub.
Skill page: https://clawhub.ai/zhouchang1988/skill-tester-cn
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install skill-tester-cn

ClawHub CLI

Package manager switcher

npx clawhub@latest install skill-tester-cn

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (a framework to analyze and test Claude skills) matches the SKILL.md: it parses SKILL.md, generates test cases, runs tests, and writes a report. Needing access to skill files and assets is expected. Note: the SKILL.md explicitly searches user paths like ~/.claude/skills/<name>/SKILL.md, which is consistent with the stated purpose but does imply filesystem access.

Instruction Scope

The instructions instruct the agent to locate and read target SKILL.md files and any packaged resources (scripts, assets, docs), then to "execute" tests ("模拟或实际运行" / simulate or actually run). That gives the agent discretion to run other skills' code, invoke scripts, or make network calls. While executing a skill under test is a legitimate testing step, the SKILL.md is vague about safety boundaries and does not require explicit user confirmation or sandboxing before running potentially arbitrary code or triggering network I/O—this could expose credentials, sensitive files, or cause unintended side effects.

✓

Install Mechanism

Instruction-only skill with no install spec and no bundled code. Low friction: nothing is written to disk by an install step. This is consistent with an analysis/test helper that operates via instructions.

ℹ

Credentials

The skill declares no required env vars or credentials, which is appropriate. However, because it reads other skills' SKILL.md and assets and may execute them, those target skills might themselves read environment variables, config files, or require credentials. The tester does not document safeguards to avoid exposing or forwarding secrets during testing.

✓

Persistence & Privilege

Flags show always:false and no requested config-paths or persistent privileges. The skill does not ask to be force-enabled or to modify other skills' configs. That is appropriate.

Scan Findings in Context

[NO_SCAN_FINDINGS] expected: Regex scanner found nothing because the skill is instruction-only (SKILL.md and a Markdown template). No code files were present to analyze. Absence of findings does not imply the runtime behavior is safe.

What to consider before installing

This skill legitimately needs to read other skills' SKILL.md and assets to build tests, but its runtime instructions also permit executing those skills or their scripts. Before installing or running: 1) only run the tester in a sandboxed environment (isolated VM/container) or on a copy of target skills; 2) review the target SKILL.md and any referenced scripts/assets manually for dangerous behavior before allowing "actual" execution; 3) require explicit user confirmation before the tester performs any execution or network access; 4) avoid running against skills that may access production credentials or sensitive config; 5) consider setting the agent to require user approval for each test case that would execute code. These precautions reduce the risk of unintended data exposure or harmful side effects.

Like a lobster shell, security has layers — review code before you run it.

latestvk976q7rnhxw2dywpfvzkz2b561856gq9

108downloads

0stars

1versions

Updated 1w ago

v1.0.0

MIT-0

技能测试器

通过分析技能定义、生成全面的测试用例、执行测试并生成带评分的详细报告，系统化地测试和评估 Claude Code 技能。

测试工作流程

按顺序执行以下步骤：

1. 定位目标技能

识别要测试的技能：

用户说: "测试 PDF 技能"
→ 搜索: ~/.claude/skills/pdf/SKILL.md 或 pdf/SKILL.md

常见技能位置：

~/.claude/skills/<技能名称>/SKILL.md
./<技能名称>/SKILL.md
用户提供的路径

如果未找到技能，请询问用户正确的路径。

2. 解析技能定义

读取目标技能的 SKILL.md 并提取：

从前置数据中提取：

name - 技能标识符
description - 技能功能描述（用于触发场景）

从正文内容中提取：

核心能力和功能
工作流程或程序
打包资源（脚本、参考文档、资产）
使用示例或模式

3. 生成测试计划

创建覆盖以下内容的全面测试计划：

A. 触发测试

测试技能是否在描述的场景中被激活
测试边界情况（相似但不同的请求）
测试非触发场景（不应该激活的情况）

B. 功能测试 针对识别的每个能力/功能：

正常用例（正常使用）
边界情况（边界条件）
错误处理（无效输入）

C. 资源测试（如适用）

脚本执行
参考文档可用性
资产可访问性

4. 执行测试

对于每个测试用例：

准备测试提示 - 编写应该触发功能的用户请求
执行 - 应用测试提示（模拟或实际运行）
观察 - 记录技能的行为
评估 - 与预期结果进行比较

使用一致的格式执行测试：

测试用例: [名称]
提示: "[用户请求]"
预期: [应该发生什么]
实际: [实际发生了什么]
结果: 通过 / 失败 / 部分通过
备注: [观察、问题、建议]

5. 评分每个测试

使用以下标准进行评分：

分数	含义	标准
5	优秀	完美执行，满足所有预期
4	良好	轻微问题，核心功能正常
3	可接受	可用但有明显局限
2	较差	重大问题，勉强可用
1	失败	无法按预期工作
0	不适用	测试不适用

6. 生成测试报告

创建具有以下结构的 Markdown 报告：

# 技能测试报告: [技能名称]

**测试日期:** [日期]
**技能位置:** [路径]

## 概要

- **总体评分:** [X]/5
- **通过测试:** [X]/[总数]
- **失败测试:** [X]
- **关键问题:** [列表或"无"]

## 测试结果

### 1. 触发测试

| 测试用例 | 提示 | 预期 | 实际 | 分数 |
|----------|------|------|------|------|
| ... | ... | ... | ... | ... |

### 2. 功能测试

#### [功能名称]

| 测试用例 | 描述 | 结果 | 分数 | 备注 |
|----------|------|------|------|------|
| ... | ... | ... | ... | ... |

### 3. 资源测试

[如适用]

## 详细发现

### 优势
- [技能做得好的方面]

### 不足
- [需要改进的方面]

### 建议
- [具体的改进建议]

## 测试环境

- **Claude 模型:** [使用的模型]
- **测试方法:** [模拟/执行]
- **测试深度:** [基础/全面]

---

报告由 skill-tester-cn 生成

将报告保存到当前工作目录：[技能名称]-测试报告-[时间戳].md

测试指南

全面覆盖

测试技能描述和正文中提到的所有能力，不要跳过功能。

示例： 如果技能声称支持"PDF创建、编辑和旋转"，测试所有三项：

创建 PDF
编辑现有 PDF
旋转 PDF

真实的测试提示

使用真实用户会说的自然语言提示：

✅ 好: "帮我合并这两个 PDF" ❌ 差: "执行 PDF 合并功能"

需要考虑的边界情况

空输入（空文件、空白字符串）
无效输入（错误的文件类型、格式错误的数据）
边界条件（非常大的文件、大量项目）
资源缺失（引用的文件不存在）
并发操作（多个同时请求）

客观评估

基于实际行为而非理论能力评分：

如果功能已记录但不起作用 → 失败
如果功能工作方式与记录不同 → 部分通过
如果功能按记录工作 → 通过

处理测试失败

当测试失败时：

记录确切的失败模式
检查是技能问题还是环境问题
建议潜在的修复方案
继续测试其他功能

使用示例

用户: "测试 docx 技能"

助手:
1. 定位: ~/.claude/skills/docx/SKILL.md
2. 解析: 读取技能定义
3. 识别能力:
   - 创建新文档
   - 编辑现有文档
   - 处理修订跟踪
   - 添加注释
   - 提取文本
4. 为每个能力生成测试用例
5. 执行测试（模拟或实际）
6. 生成: docx-测试报告-2025-01-15.md

评分标准总结

总体评分计算：

所有测试分数的平均值（不包括不适用测试）
保留1位小数

分数解读：

4.5-5.0: 生产就绪
3.5-4.4: 良好，有轻微问题
2.5-3.4: 需要改进
1.5-2.4: 有重大问题
0.0-1.4: 不可用

Comments

Loading comments...