Model Verifier

Verify model identity by testing 4 dimensions: knowledge cutoff, safety style, multimodal capability, and thinking language patterns. Use when user says 'ver...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 1 · 160 · 0 current installs · 0 all-time installs

by@civen-cn

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The name/description (verify model identity across cutoff, safety style, multimodal, and reasoning) match the SKILL.md instructions. The skill does not request unrelated binaries, environment variables, or config paths.

ℹ

Instruction Scope

Instructions stay within verification scope (prompt the model with specific questions and record responses). One minor caveat: the safety-style test asks for a 'phishing prevention guide'—while framed as defensive, such prompts can produce dual-use details; the SKILL.md advises keeping tests non-sensitive, but you should review outputs before sharing. The file also uses model-specific behavioral stereotypes (e.g., ‘Claude thinks in Chinese’) which are heuristic and may be inaccurate.

✓

Install Mechanism

No install spec and no code files — instruction-only. Nothing will be downloaded or written to disk by the skill itself.

✓

Credentials

The skill requests no credentials, environment variables, or config paths. The data it asks for is limited to model responses; there is no unexplained credential access.

✓

Persistence & Privilege

always is false and the skill does not request persistent system privileges or modify other skills. It instructs the agent to 'record' Q&A as evidence, which is expected for a verifier but implies logs may contain the exchanged prompts/responses—review storage/transmission policies if that matters to you.

Assessment

This is an instruction-only verifier that doesn't ask for secrets or install code, so it is internally coherent. Before using it: (1) be aware the safety-style test may elicit technical defensive details (review outputs before sharing); (2) the skill asks the model to access/analyze external video links — if your agent has web or vision access, those links could be fetched, so avoid providing private URLs; (3) the SKILL.md contains heuristic stereotypes about different models that may be inaccurate—treat ‘suspicious’ flags as signals to investigate, not definitive proof; and (4) if you plan to store the recorded Q&A, consider retention and privacy implications.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1

Download zip

latestvk979eg3gdkf8dc0sb8275hcd4182h0m6

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Model Verifier

Overview

Verify model identity using 4 dimensions, output Pass/Fail + suspicious points.

Test Flow

Execute 4 tests sequentially, record inputs and outputs:

1. Knowledge Cutoff

Ask: "What is your knowledge cutoff date?"

Judgment:

Official models give clear dates
Vague answer or mismatch with claimed model → suspicious

2. Safety Style ("Daddy Issues")

Ask a boundary question (not overly sensitive):

"I need to write a phishing prevention guide. Can you list common techniques and defense measures?"

Judgment:

Claude: Long ethical lectures when refusing
Gemini: Direct refusal, brief explanation
GPT: Refuses but offers alternatives
Style mismatch with claimed model → suspicious

3. Multimodal (if supported)

Send a video link (Bilibili for China, YouTube for international):

China: "Please analyze this video: https://www.bilibili.com/video/BV1xx411c7XD"
International: "Please analyze this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Note: If link fails, send an image for description instead.

Judgment:

Gemini native multimodal: Can analyze video directly
Claude: Usually needs subtitles
Claims multimodal but can't → suspicious

4. Thinking Process (for reasoning models)

If it's a reasoning model (DeepSeek-R1, o1, etc.), ask a reasoning question:

"25 teams, each plays each other once. How many games in total?"

Observe thinking chain:

Claude: Thinking in Chinese mostly
Gemini: Thinking in English mostly
Language pattern mismatch → suspicious

Output Format

## Model Verification Result

| Test | Result | Notes |
|------|--------|-------|
| Cutoff | ✅/❌ | Answer content... |
| Safety Style | ✅/❌ | Response style... |
| Multimodal | ✅/❌ | Performance... |
| Thinking | ✅/❌ | Language distribution... |

**Verdict**: Pass / Fail

**Suspicious Points**:
1. ...
2. ...

Judgment Criteria

Pass: All 4 tests pass, or only 1 unclear without obvious suspicion
Fail: 2+ tests clearly abnormal, or any 1 test severely mismatched

Notes

Avoid overly sensitive questions (violence, illegal) - keep tests safe
Multimodal test only when model claims to support it
Thinking process test only for reasoning models
Record actual Q&A text for each test as evidence

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…