Install
openclaw skills install @xavierjiezou/virtual-tryon-scorerScore and evaluate virtual try-on (VTON) results by analyzing identity preservation, garment fidelity, body consistency, and background stability. Use this skill whenever the user uploads virtual try-on images and wants quality assessment, scoring, or feedback. Trigger on phrases like "try-on scoring", "virtual try-on evaluation", "rate this try-on", "试穿打分", "试穿效果评价", "虚拟试穿评分", or when the user shares before/after clothing swap images and asks for quality judgment. Also trigger when the user mentions VTON, clothing transfer, garment swap, or outfit change evaluation — even if they don't explicitly say "virtual try-on".
openclaw skills install @xavierjiezou/virtual-tryon-scorerEvaluate the quality of AI-generated virtual try-on results by comparing source person, target garment, and the generated output. Produce structured per-dimension scores with brief explanations, then aggregate into a weighted total score.
Users may provide images in two ways. Correctly identifying which format you're dealing with is critical — getting this wrong invalidates the entire evaluation.
The user uploads three distinct images:
When three images are provided, ask the user to clarify which is which if it's not obvious from context or filenames. Common naming patterns: "person/model/source", "cloth/garment/target", "result/output/generated".
The user uploads one image that contains all three photos stitched together (side by side, or in a grid layout). In this case:
Key distinction signals:
Score each dimension on a 0–100 scale. The dimensions are listed in order of importance, which also determines their weight in the final score.
This is the single most important criterion. The person in the try-on result must be recognizably the same individual as in the source image. The face is the primary carrier of identity — if the face changes, the try-on is fundamentally broken regardless of how good everything else looks.
What to examine:
Scoring guide:
The clothing in the result should faithfully reproduce the target garment's visual characteristics and fit naturally on the person's body. This matters because the whole point of virtual try-on is to show how a specific garment looks on a specific person.
What to examine:
Scoring guide:
Beyond the face, other body characteristics should remain consistent between the source and result. This reinforces the overall sense that this is genuinely the same person, not a face-swap on a different body.
What to examine:
Scoring guide:
The background should remain stable between the source person image and the try-on result. Background changes are distracting and reduce the realism of try-on.
What to examine:
Scoring guide:
Present the evaluation in this exact structure:
## 虚拟试穿效果评分报告
### 输入识别
- 输入格式:[三张独立图片 / 单张拼接图]
- 原始人物:[简要描述人物特征]
- 目标服装:[简要描述服装特征]
- 试穿结果:[简要描述试穿效果概况]
### 分项评分
#### 1. 人脸身份保持 (权重 40%)
- **得分:XX/100**
- 评价:[1-2 sentences explaining the score]
#### 2. 服装还原与贴合 (权重 30%)
- **得分:XX/100**
- 评价:[1-2 sentences explaining the score]
#### 3. 非人脸身体特征保持 (权重 20%)
- **得分:XX/100**
- 评价:[1-2 sentences explaining the score]
#### 4. 背景保持 (权重 10%)
- **得分:XX/100**
- 评价:[1-2 sentences explaining the score]
### 总分
- **加权总分:XX.X/100**
- 计算方式:(人脸 × 0.4) + (服装 × 0.3) + (身体 × 0.2) + (背景 × 0.1)
### 总体评价
[2-3 sentences summarizing the overall quality, highlighting the strongest and
weakest aspects, and suggesting what could be improved in the try-on pipeline]
The scoring should be honest and calibrated. A few guiding principles:
Don't grade on a curve. A score of 95 should mean genuinely excellent quality, not just "better than average." Reserve scores above 90 for results that would fool a careful human observer.
Weight the dimensions as specified. Face identity (40%) dominates because a face change means the try-on has failed its core purpose — showing what this person would look like in that outfit. A technically perfect garment transfer with a different face is worthless.
Be specific in feedback. Don't just say "looks good" or "has issues." Point to concrete observations: "the nose bridge appears slightly narrower" or "the striped pattern on the left sleeve is distorted."
Consider the use case. Virtual try-on is a practical tool — users want to know if they'd look good in a piece of clothing before buying it. Evaluate from that perspective: would this result help someone make a confident purchase decision?
If garment type doesn't match (e.g., the source wears a t-shirt and the result shows a completely different category like a dress), note this but still evaluate the result against the target garment reference.
If the image quality is very low, note the limitation and explain that the scores might not be fully reliable due to resolution constraints.
If the user only provides two images (missing one of the three), ask which one is missing and whether they can provide it. If they can't provide the garment reference, you can still evaluate face/body/background but should caveat the garment score.
If the try-on only changes part of the outfit (e.g., only the top), only evaluate the changed portion for garment fidelity, and note what was preserved from the original.