{"skill":{"slug":"advanced-evaluation","displayName":"Advanced Evaluation","summary":"This skill should be used when the user asks to \"implement LLM-as-judge\", \"compare model outputs\", \"create evaluation rubrics\", \"mitigate evaluation bias\", o...","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":120,"installsAllTime":1,"installsCurrent":1,"stars":0,"versions":1},"createdAt":1775040510703,"updatedAt":1775041618292},"latestVersion":{"version":"1.0.0","createdAt":1775040510703,"changelog":"Initial release of advanced-evaluation, a comprehensive skill for building robust LLM evaluation systems.\n\n- Provides actionable guidance for implementing LLM-as-judge in automated pipelines.\n- Explains evaluation methods: direct scoring vs. pairwise comparison, with reliability and bias considerations.\n- Details systemic LLM biases (e.g., position, length, self-enhancement) and mitigation strategies.\n- Outlines metric selection frameworks for different evaluation tasks.\n- Supplies prompt templates and protocols for direct scoring, pairwise comparison, and rubric creation.\n- Offers practical patterns for evaluation pipeline design and rubric adaptation by domain.","license":"MIT-0"},"metadata":null,"owner":{"handle":"karmaent","userId":"s17ftg7vfcekcxtzxa4vqgfprh840axc","displayName":"KarmaENT","image":"https://avatars.githubusercontent.com/u/172393619?v=4"},"moderation":null}