{"skill":{"slug":"llm-evaluation","displayName":"Llm Evaluation","summary":"Deep LLM evaluation workflow—quality dimensions, golden sets, human vs automatic metrics, regression suites, offline/online signals, and safe rollout gates f...","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":178,"installsAllTime":1,"installsCurrent":1,"stars":0,"versions":1},"createdAt":1774397771175,"updatedAt":1774399914610},"latestVersion":{"version":"1.0.0","createdAt":1774397771175,"changelog":"llm-evaluation 1.0.0\n\n- Initial release of a comprehensive workflow for deep LLM evaluation.\n- Covers definition of quality dimensions, dataset/rubric development, automatic and human evaluation, regression gates, and online validation.\n- Guidance on when and how to apply the workflow, including trigger conditions and risk management.\n- Includes detailed stage-by-stage practices, checklists, and tips for robust, reproducible model assessment.\n- Tailored for use cases such as prompt/model updates, CI for LLM outputs, RAG, and agent evaluation.","license":"MIT-0"},"metadata":null,"owner":{"handle":"codenova58","userId":"s173fekm3yw84k7gp861dme7dd83gvyf","displayName":"codenova58","image":"https://avatars.githubusercontent.com/u/191358186?v=4"},"moderation":null}