{"skill":{"slug":"llm-eval-harness","displayName":"Llm Eval Harness","summary":"Evaluate LLM outputs systematically — run test suites, score responses for accuracy/relevance/safety, compare models, and detect regressions in AI applications.","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":31,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":1},"createdAt":1777509715414,"updatedAt":1777510912391},"latestVersion":{"version":"1.0.0","createdAt":1777509715414,"changelog":"- Initial release of llm-eval-harness.\n- Systematically evaluate LLM outputs across multiple dimensions: accuracy, relevance, safety, consistency, and helpfulness.\n- Run test suites and score responses using detailed rubrics.\n- Compare model performance side-by-side with metrics for accuracy, speed, cost, and safety.\n- Detect regressions after prompt or model updates and generate comprehensive evaluation reports.\n- Supports automated evaluation methods including string matching, semantic similarity, code execution, and LLM-as-judge.","license":"MIT-0"},"metadata":null,"owner":{"handle":"charlie-morrison","userId":"s17cttbdxry5kkyafjw983mq8s83p4y3","displayName":"charlie-morrison","image":"https://avatars.githubusercontent.com/u/271589886?v=4"},"moderation":null}