Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Wei Cross Research

v1.0.0

Cross-validate research answers by querying multiple LLMs in parallel with judge-based synthesis. Reduces hallucination and surfaces model disagreements for...

0· 101·0 current·0 all-time
byWei Han@mikehankk

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for mikehankk/wei-cross-research.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Wei Cross Research" (mikehankk/wei-cross-research) from ClawHub.
Skill page: https://clawhub.ai/mikehankk/wei-cross-research
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install mikehankk/wei-cross-research

ClawHub CLI

Package manager switcher

npx clawhub@latest install wei-cross-research
Security Scan
Capability signals
Requires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The skill actually implements multi‑model querying and judge‑based synthesis (clients for OpenRouter, Bailian/DashScope, and an OpenAI‑compatible client are present), which aligns with the description. However, the published registry metadata declared no required environment variables or primary credential, while SKILL.md and config.json clearly expect API keys (OPENROUTER_API_KEY and/or DASHSCOPE_API_KEY). That mismatch is an incoherence: the skill will need external model API keys but the registry listing does not advertise them.
Instruction Scope
SKILL.md and the scripts instruct the agent to read config.json and prompt templates, query remote model endpoints, and write outputs to intermediate/ and reports/ files (expected). The docs instruct creating a .env file to store API keys (OPENROUTER_API_KEY, DASHSCOPE_API_KEY) — those are relevant to functionality but are sensitive. The instructions also include running a remote install script (curl https://bun.sh/install | bash), which expands the scope of runtime actions beyond the repo code. The agent implementation includes input sanitization against prompt‑injection patterns, and model outputs are saved locally (intermediate files) which could contain user‑supplied or model‑returned sensitive data.
!
Install Mechanism
There is no formal install spec in the registry (instruction‑only), but SKILL.md instructs users to run a remote bootstrapper (curl https://bun.sh/install | bash) to install Bun, then run bun install. Running an arbitrary remote install script is higher risk than using a reviewed package manager invocation. The Node dependencies are limited (axios, dotenv), which is proportional, but the remote install step is a notable risk that the user should not run blindly.
Credentials
The only sensitive environment variables referenced are OPENROUTER_API_KEY and DASHSCOPE_API_KEY, which are appropriate for a tool that calls OpenRouter and DashScope/Bailian. That is proportionate to its purpose. However, the registry metadata did not declare these required env vars; SKILL.md does — the inconsistency could mislead users into thinking no credentials are needed. The code will read config.json and expect api_key_env names there; ensure you only provide least‑privilege keys and do not store unrelated secrets in .env.
Persistence & Privilege
The skill does not request always:true or other elevated platform privileges. It writes reports and intermediate model outputs to local project directories (intermediate/, reports/) and does not appear to modify other skills or global agent configuration. Autonomous invocation is allowed (platform default) but is not combined with unusually broad credentials or persistent privileges.
Scan Findings in Context
[ignore-previous-instructions] expected: The static pre-scan flagged prompt‑injection text. The code and SKILL.md explicitly reference prompt‑injection patterns and include sanitization logic that censors 'ignore previous instructions' and similar phrases. The presence of such patterns is likely defensive (the agent strips injection phrases) rather than an attempt to inject, but the flagged pattern warrants review because prompt injection content appears in docs/code.
What to consider before installing
What to check before installing or running this skill: 1) Registry vs docs: The skill really needs API keys (OPENROUTER_API_KEY and/or DASHSCOPE_API_KEY) even though the registry metadata lists none — don't assume no credentials are required. Provide only least‑privilege keys. 2) Remote installer caution: SKILL.md recommends running a remote install script (curl https://bun.sh/install | bash). Avoid running arbitrary bootstrap scripts on sensitive machines — prefer to install Bun from your distro/package manager or inspect the script first and run in an isolated environment (VM/container). 3) Review the client code: Inspect scripts/clients/* (openrouter, bailian, openai_compliant) to confirm they call only the documented API endpoints and that no hardcoded or hidden remote endpoints exist. Ensure request/response handling doesn't leak data to unexpected endpoints. 4) Data at rest: The agent saves intermediate and report files under intermediate/ and reports/. These may contain model outputs and any user input. If you will process sensitive queries, run in an isolated directory or container and ensure filesystem permissions meet your requirements. 5) Prompt injection strings: The project contains strings and sanitization for prompt‑injection patterns. While this is defensive, verify that the sanitization is sufficient for your threat model and that you are comfortable with how user input is forwarded to external models. 6) Run minimally and audit: First run a benign test query in an isolated environment while monitoring network egress and logs. If you accept the small inconsistencies (missing declared env vars) and the remote install step after inspection, the code appears coherent with its stated purpose; otherwise avoid installing on production systems.
scripts/clients/bailian.ts:136
Environment variable access combined with network send.
scripts/clients/openai_compliant.ts:152
Environment variable access combined with network send.
scripts/clients/openrouter.ts:120
Environment variable access combined with network send.
Patterns worth reviewing
These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

consensusvk975qchhq9dw249jzmbb93ayw9854xe6latestvk975qchhq9dw249jzmbb93ayw9854xe6multi-modelvk975qchhq9dw249jzmbb93ayw9854xe6reasoningvk975qchhq9dw249jzmbb93ayw9854xe6researchvk975qchhq9dw249jzmbb93ayw9854xe6
101downloads
0stars
1versions
Updated 1w ago
v1.0.0
MIT-0

Wei Cross Research Skill

Version: 1.0.0 | Last updated: 2026-04-19

Overview

Use wei-cross-research when you need a reliable answer — not just one model's opinion.

This skill queries multiple LLMs in parallel and uses a judge model to synthesize their responses into a single cross-validated answer. When models agree, confidence is high. When they disagree, the disagreement is surfaced — not silently resolved.

Best for:

  • High-stakes questions where a wrong answer has real consequences
  • Topics where a single model may have blind spots or biases
  • Analysis that benefits from multiple independent viewpoints
  • Reducing hallucination via cross-model comparison

Cost note: This skill queries 2–3 models per request. Expect approximately 2–3x the token usage of a single-model query. Use it when answer quality justifies the cost; avoid it for simple or low-stakes questions.


Requirements

Install Bun

curl -fsSL https://bun.sh/install | bash

Environment Variables

Create a .env file in the project root or set environment variables:

# Required for OpenRouter models (answering models)
OPENROUTER_API_KEY=your_openrouter_api_key

# Required for judge models (Bailian/DashScope)
DASHSCOPE_API_KEY=your_dashscope_api_key

Note: If you don't have an OpenRouter API key, you can modify config.json to use other providers. Change the provider field from "openrouter" to "bailian" or "openai_compliant" and update the api_key_env to point to your available API key.

Configuration Files

遇到模型访问问题? 请参考 README.md 了解如何根据你的网络环境选择和配置 config.json

Install Dependencies

bun install

Usage

bun run scripts/index.ts "your question"
bun run scripts/index.ts -t financial "美联储2026年会降息吗?"

Domain-Specific Judges

When queryType is set to "financial", the judge step uses a finance-specialized prompt that produces:

  • Base Case Analysis — probabilistic scenario with data-driven reasoning
  • Bull Case — arguments for upside scenario
  • Bear Case — arguments for downside scenario
  • Key Variables / Risks — macro events, earnings, policy changes, market sentiment

This avoids deterministic predictions and enforces probability ranges (e.g., 60–70% likelihood). Use it for investment, macroeconomic, and market analysis questions.

Example:

{
  "query": "美联储2026年会降息吗?",
  "queryType": "financial"
}

Supported Models

All models are accessed via OpenRouter or other configured providers. Answering models may use live retrieval depending on the provider configuration.

The system selects 2–3 answering models in parallel (based on roles) and uses a judge model to synthesize the final response.

Model Roles

Each model in config.json is tagged with one or more roles indicating its capabilities:

RoleDescriptionTypical Use
retrievalHas web/live data accessCurrent events, real-time info
codingStrong programming capabilityTechnical implementation, debugging
socialSocial media data accessX/Twitter sentiment, trending
reasoningDeep analytical capabilityComplex analysis, synthesis
creativeCreative writing strengthStorytelling, open-ended tasks
longcontextLarge context windowDocument analysis, long inputs
generalBroad balanced capabilityFallback, ambiguous queries
judgeAnswer synthesisFinal synthesis (judge models only)

Note: Specific model names and their roles are defined in config.jsonmodels. Refer to that file for the current model roster.

Judge Models

Judge models synthesize answers already in context and normally do not require retrieval.

They are configured in config.json with role "judge" and selected via the judge_model config key.

Judge models are independent of answering models and may synthesize outputs from any answering pool.


Model Selection

Model selection is controlled via config.json using a roles-based routing system. Instead of hard-coding model names, you select models by the capabilities (roles) they provide.

How to Select Models

As the calling model, follow this process:

  1. Classify the query — Match keywords to determine the queryType
  2. Pass queryType — The skill will look up the routing.xxx.models in config.json
  3. (Optional) Pass explicit models — Use the models parameter to bypass auto-selection

Query Types (Domain)

queryTypeDescription
financialMarkets, investing, macroeconomics
technicalProgramming, systems, engineering
socialPublic opinion, social media sentiment
current_eventsRecent news and real-time information
scientificObjective knowledge, definitions, theories
creativeWriting, design, ideation
generalDefault fallback

Intent (Task Type)

In addition to queryType, queries may include an optional intent field. queryType defines the domain (what the question is about), while intent defines the task (what to do with the question). If intent is not provided, the system defaults to analysis for complex queries and lookup for simple factual queries.

intentDescription
lookupRetrieve factual information
analysisDeep reasoning and explanation
comparisonCompare multiple entities
predictionForecast future outcomes (used in financial)
generationCreate content (text, ideas, design)

Example:

{ "query": "美联储2026年会降息吗?", "queryType": "financial", "intent": "prediction" }

Selection Algorithm

1. Analyze query → match keywords → determine queryType
2. Pass queryType to skill → skill looks up `routing.<queryType>.models` in config.json
3. Skill selects top 2–3 models from the routing config
4. If queryType === 'financial', skill uses judge_financial.txt for synthesis

Examples

Example 1: Financial Query

Query: "美联储2026年会降息吗?"

Selection process:

  1. Keywords: 美联储, 降息 → queryType: financial
  2. Pass to skill: { "query": "...", "queryType": "financial" }
  3. Skill looks up: config.jsonrouting.financial.models
  4. Skill selects: First 2 models from the routing config
  5. Judge: Uses judge_financial.txt (Bull/Bear/Base Case analysis)

Example 2: Technical Query

Query: "How do I implement a distributed transaction?"

Selection process:

  1. Keywords: implement, distributed → queryType: technical
  2. Pass to skill: { "query": "...", "queryType": "technical" }
  3. Skill looks up: config.jsonrouting.technical.models
  4. Skill selects: Models configured for technical queries

Example 3: Social Query

Query: "What are people saying about SpaceX on Twitter?"

Selection process:

  1. Keywords: Twitter, saying → queryType: social
  2. Pass to skill: { "query": "...", "queryType": "social" }
  3. Skill looks up: config.jsonrouting.social.models
    • Note: grok-4.1 has social, sentiment, trending roles + X data access

When to Reference Specific Models

Only hard-code model names when:

  1. Special data access — e.g., grok-4.1 for X/Twitter data, kimi-k2.5 for 200K context
  2. Known strengths — e.g., qwen3.5 for coding tasks based on benchmarks
  3. Avoiding specific models — e.g., excluding models known to underperform for certain tasks

In these cases, document why that specific model is needed, not just its name.


When To Use This Skill

Use this skill when:

  • The user asks a complex research question
  • The question requires high confidence or cross-validation
  • The topic has multiple competing viewpoints
  • A factual error would have significant consequences

Do NOT use this skill for:

  • Simple factual lookups
  • Quick definitions or summaries
  • Trivial tasks a single model can answer reliably
  • Time-sensitive queries where 8–15s latency is unacceptable

Skill Parameters

ParameterTypeDescription
querystringThe research question
queryTypestringDomain classification (financial, technical, etc.)
intentstringTask type (analysis, prediction, etc.)
modelsarrayOverride model selection
maxModelsnumberMax models
depthstringsimple / tree
judgeModelstringOverride judge

Note: The domain parameter has been deprecated. Use queryType: 'financial' instead for financial queries.

Example:

{
  "query": "What are the economic impacts of AI agents?",
  "queryType": "general",
  "intent": "analysis"
}
{
  "query": "美联储2026年会降息吗?",
  "queryType": "financial",
  "maxModels": 2
}

Depth Modes

ModeBehaviorUse When
simple (default)Single-pass: each model answers the query once, judge synthesizesMost research questions
treeMulti-pass: follow-up sub-queries are generated and answered before synthesisComplex topics requiring decomposition (adds ~10–20s latency)

Output Format

Success (all models respond):

{
  "query": "user question",
  "models_used": ["glm-5", "kimi-k2.5"],
  "answers": [
    { "model": "glm-5", "answer": "..." },
    { "model": "kimi-k2.5", "answer": "..." }
  ],
  "final_answer": "...",
  "confidence": 0.85
}

Partial failure (one model timed out or errored):

{
  "query": "user question",
  "models_used": ["glm-5"],
  "models_failed": [
    { "model": "kimi-k2.5", "reason": "timeout" }
  ],
  "answers": [
    { "model": "glm-5", "answer": "..." }
  ],
  "final_answer": "...",
  "confidence": 0.61,
  "warning": "Synthesis based on partial responses. Confidence may be reduced."
}

Full failure:

{
  "query": "user question",
  "models_used": [],
  "models_failed": [
    { "model": "glm-5", "reason": "timeout" },
    { "model": "kimi-k2.5", "reason": "api_error" }
  ],
  "final_answer": null,
  "error": "All models failed. Please retry."
}

Confidence scale: All confidence values use a 0–1 scale (e.g., 0.85 = 85% confidence). This applies consistently across normalizer outputs and judge outputs.


Result Files

Each run produces files identified by a shared timestamp in YYYY-MM-DDTHH-MM-SS format (ISO 8601, colons replaced with hyphens).

The timestamp is logged at the start of execution:

[ResearchAgent] Timestamp: 2026-03-19T14-30-05

File Locations

FilePathContent
Reportreports/report-{timestamp}.txtFinal synthesized answer from judge
Model responsesintermediate/{model}-{timestamp}.txtRaw response from each answering model
Judge rawintermediate/{judge}-{timestamp}.txtRaw judge synthesis output

Example

For a run at 2026-03-19T14:30:05 with models kimi-k2.5 and gpt-5.4, judge glm-5:

reports/report-2026-03-19T14-30-05.txt        ← final answer
intermediate/kimi-k2.5-2026-03-19T14-30-05.txt
intermediate/gpt-5.4-2026-03-19T14-30-05.txt
intermediate/glm-5-judge-raw-2026-03-19T14-30-05.txt

Use the timestamp from console output to locate all files from a specific run.


Performance Characteristics

StageTypical Latency
Router~1s (skipped when models passed directly)
Model inference (parallel)20–100s
Judge synthesis20-60s
Total40–120s

Timeout per model: 60-120 seconds Retries per model: 1


Failure Handling

The skill tolerates partial failures:

  • If a model times out or errors, the skill continues with remaining responses
  • The judge synthesizes available answers and notes missing models in output
  • If all models fail, a structured error is returned (see Output Format above)
  • The router has a default fallback pair (glm-5 + kimi-k2.5) if routing fails

Security Notes

  • User-supplied query values are included in prompts sent to external model APIs. Avoid passing unsanitized inputs from untrusted sources.
  • The skill does not validate or filter query content — callers are responsible for input sanitization upstream.
  • Do not include secrets, PII, or confidential data in queries unless the target model APIs are approved for that data classification.

Quality Evaluation

A synthesized answer is considered high quality when:

  • Consensus points across models are clearly identified
  • Disagreements are surfaced (not silently resolved)
  • Confidence ≥ 0.75
  • The judge does not fabricate citations or sources

For ongoing quality tracking, log confidence, models_used, and models_failed per request.


Best Practices

Recommended Model Combinations

intentRole CombinationExample
lookupretrieval + generalQuick factual lookup + balanced fallback
analysisreasoning + retrievalDeep analysis + live data context
predictionreasoning + synthesisForecast with multi-source synthesis
comparisonreasoning + structuredEvaluate options systematically
generationcreative + synthesisCreate + refine output
queryTypeRecommended RolesWhy
financialretrieval + researchLive data + analysis
technicalcoding + generalTechnical + broader context
socialsocial + retrievalSentiment + current context
creativecreative + synthesisGenerate + refine

Why Role Diversity Matters

Combining models with different roles improves reliability:

  • retrieval + reasoning: Up-to-date facts + deep analysis
  • coding + general: Technical accuracy + broader context
  • social + retrieval: Platform-specific sentiment + general web context

Benefits:

  • Higher reliability through capability diversity
  • Reduced hallucination via cross-validation
  • Improved reasoning quality on ambiguous topics

Example Usage

use cross-research

query="What are the major AI breakthroughs in the past 12 months?"
queryType="current_events"

Selection process:

  1. Keywords: "past 12 months" → implies current_events
  2. Pass queryType: "current_events" to skill
  3. Skill looks up config.jsonrouting.current_events.models
  4. Judge synthesizes responses

Example result:

Final Answer:
AI breakthroughs in the last year include...

Consensus:
- Agent frameworks matured significantly
- Multimodal models expanded in capability
- Inference costs decreased substantially

Confidence: 0.87

Changelog

VersionChanges
1.0.0Initial release

Comments

Loading comments...