ai-shifu-course-creator · 触发评测报告

Trigger Eval · 2026-04-02 · eval model: claude-sonnet-4-6 · improve model: claude-opus-4-6
评测用例总数
40
20 应触发 · 20 不应触发
优化前准确率
61%
Iter 1 · 原始描述
优化后准确率
98%
Iter 2 · Train 100% · Test 96%
优化迭代次数
2
第 2 轮 train 全部通过,提前停止

描述对比

优化前
Convert raw course material into optimized, runnable MarkdownFlow teaching scripts and deploy them as live courses through a five-phase pipeline covering segmentation, orchestration, generation, optimization, and deployment.
优化后(Opus 提出)
Use when the user works with AI-Shifu (AI师傅) courses in any capacity: creating, writing, editing, rewriting, optimizing, reordering, deploying, publishing, previewing, or managing MarkdownFlow (MDF) lesson scripts. Covers the full course lifecycle — from converting raw material into structured lessons, to scripting interactions (single-select, multi-select, input, branching), adding variables, images, and system prompts, to deploying and managing live courses on the AI-Shifu platform. Trigger on any mention of AI-Shifu, AI师傅, or MarkdownFlow course scripting.

逐轮结果

Iteration 1

原始描述

Train
62%
15 / 24
Test
62%
10 / 16
Recall
22%
漏触发严重
Iteration 2

优化后描述

Train
100%
24 / 24
Test
100%
16 / 16
Recall
~96%
2 条偶发 2/3

用例详情

筛选:
用例 数据集 预期 Iter 1 Iter 2
PASS(≥2/3 触发 或 0/3 不触发)
FAIL
偶发(2/3,判定为 PASS)