Cn Client Investigation

v0.9.6

China mainland client investigation and banker-grade analysis with strict guards for Chinese text accuracy and data provenance. Use when the target is an A-s...

⭐ 0· 103·0 current·0 all-time

byjackdark@jackdark425

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for jackdark425/cn-client-investigation.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Cn Client Investigation" (jackdark425/cn-client-investigation) from ClawHub.
Skill page: https://clawhub.ai/jackdark425/cn-client-investigation
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install jackdark425/cn-client-investigation

ClawHub CLI

Package manager switcher

npx clawhub@latest install cn-client-investigation

Security Scan

Capability signals

CryptoRequires sensitive credentials

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

high confidence

Purpose & Capability

The name/description (China client investigation, banker-grade QA) aligns with included lexicon, PDF/market-data checks, and the provenance/typo-scan tooling. However, the skill's code expects local OpenClaw configuration and external MCP/tokens (e.g. PrimeMatrix bridge, Tushare) while the registry metadata lists no required environment variables or credentials. That mismatch (no declared credentials but code that uses/exports MCP_API_KEY, TUSHARE_TOKEN, and reads ~/.openclaw/openclaw.json) is incoherent and requires explanation.

Instruction Scope

SKILL.md describes using web_fetch and specific MCP tools and mandates running provenance/typo scanners — appropriate for the purpose. But runtime instructions and included scripts (e.g. provenance_verify.py, cn_typo_scan.py, build_deck.py, bj_smoke_v2.py) read local config, shell out to node/bridge processes, and expect the agent or operator to edit skill files (e.g. add lexicon entries by editing references/cn-lexicon.js). Allowing automated edits to shipped skill files and reading ~/.openclaw/openclaw.json expands scope beyond 'analysis only' and should be explicitly documented/justified.

ℹ

Install Mechanism

There is no install spec (instruction-only), but many included scripts require runtime dependencies (node + pptxgenjs, python3 + python-pptx and other python libs) and expect certain files at user-specific paths. The absence of a declared install step or dependency list is a practical gap: users must manually satisfy runtime deps, and the skill bundle contains executable scripts that will be used at runtime.

Credentials

The registry metadata declares no env vars; yet scripts reference and use credentials: bj_smoke_v2.py reads TUSHARE_TOKEN (with a hard-coded default token), multiple scripts read ~/.openclaw/openclaw.json to obtain 'PRIMEMATRIX_MCP_API_KEY' and 'PRIMEMATRIX_BASE_URL' and then pass them into subprocesses (node bridge). Additionally, bj_smoke_v2.py explicitly pops HTTP_PROXY/HTTPS_PROXY/ALL_PROXY from os.environ, preventing proxy routing (this is a red flag because it bypasses local proxy/monitoring). These behaviors (undeclared credential use, hard-coded token, and proxy removal) are disproportionate unless the skill explicitly documents which secrets it needs and why.

ℹ

Persistence & Privilege

always:false and user-invocable are appropriate. The skill does read user config (~/.openclaw/openclaw.json) and writes deliverable files and lexicon edits into the skill's files/directories — expected for a tool that maintains a lexicon. It does not declare modifying other skills or system-wide settings. Still, allowing agent-driven editing of files included in the skill gives it write capability to its own code bundle and should be scoped/controlled.

What to consider before installing

Key things to consider before installing or running this skill: - The skill metadata declares no required credentials, but the code expects them. Ask the publisher to explicitly list required env vars (e.g. TUSHARE_TOKEN, PRIMEMATRIX_MCP_API_KEY, PRIMEMATRIX_BASE_URL) and explain how they are used. - The scripts read your ~/.openclaw/openclaw.json to pull MCP API keys and then pass them into subprocesses (node bridge). If you install, inspect that file for secrets and consider using an account with least privilege or a dedicated service account. - bj_smoke_v2.py includes a hard-coded TUSHARE_TOKEN default in the source and removes HTTP_PROXY/HTTPS_PROXY/ALL_PROXY from the environment — both are red flags. Request removal of the hard-coded token and removal of the proxy-evading behavior or require an explicit justification. - The package has no declared install steps or dependency list. Verify required runtimes (node, pptxgenjs, python3, python-pptx, etc.) and preferably run the skill in an isolated environment (sandbox or VM) first. - Because the skill allows editing of references/cn-lexicon.js (the repo files), ensure you control what edits are permitted; an agent with write access could modify its own behavior. Limit automated agent-write permissions or require human review of any commits to the skill files. - Operational mitigations: restrict the skill's autonomous invocation if unsure, audit network calls when running (or run with outbound network blocked until you review), rotate any tokens before/after use, and prefer providing credentials via ephemeral, least-privilege accounts. Overall: the code mostly fits the stated purpose, but undocumented credential access, proxy disabling, and missing dependency/credential declarations make this suspicious — get clarifications and fixes before using in a production environment.

Like a lobster shell, security has layers — review code before you run it.

latestvk97a8n5m7g1txeqhgqcgdes25d856etb

103downloads

0stars

1versions

Updated 6d ago

v0.9.6

MIT-0

CN Client Investigation Skill

中国大陆客户调查分析 — 文字与数据双重保证

Use this skill when the target company is a China-market entity and the deliverable (MD / PPTX / Excel) must be free of Chinese character-level typos and based on verifiable Chinese-market data sources.

Why this skill exists

实测 2026-04-18 寒武纪 (Cambricon 688256.SH) 分析中，MiniMax-M2.7 在 pptxgenjs \uXXXX escape 序列里把"寒武纪"打成了"宽厭谛79"，把"净利/财务/亏损"打成"洁利贜务贜损"。根因是中文罕见词的 token 级 BPE 切块 + 字符级错位。这个 skill 提供两道防线：

Pre-computed Chinese lexicon：关键公司名 / 行业术语 / 财务词 先写死在本 skill 的 reference 文件里，agent 生成 slide JS 时直接 require('./references/cn-lexicon.js') 读取，不经过模型逐字符生成。
UTF-8 literal over Unicode escape：JS 源码里 slide.addText("寒武纪科技") 而不是 slide.addText("\u5BD2\u6B66\u7EAA\u79D1\u6280")。literal 中文走 UTF-8 字节不经过 \uXXXX 解码路径，绕开 MiniMax 的主要 typo 通道。

Mandatory guardrails

Rule 1 — No `\uXXXX` escape for Chinese text (text accuracy)

In every slide-NN.js / markdown / Python code you emit, write Chinese characters as UTF-8 literals. NEVER encode Chinese as \uXXXX escape sequences.

// ✅ CORRECT — UTF-8 literal
slide.addText("寒武纪科技深度分析", { fontSize: 44, fontFace: "Microsoft YaHei" });

// ❌ WRONG — Unicode escape, MiniMax token-level typo risk
slide.addText("\u5BD2\u6B66\u7EAA\u79D1\u6280...", { ... });

All source files must be saved as UTF-8 no BOM. The write / edit tools accept raw Chinese fine; do not pre-encode to \uXXXX thinking it's "safer" — it is the opposite.

Rule 2 — Lexicon lookup before typing key terms (text accuracy)

Before emitting any of these classes of Chinese terms in code, look them up in references/cn-lexicon.js:

Target company full name (e.g. 寒武纪科技、海光信息、摩尔线程智能科技)
Section headers (公司概览、财务分析、竞争格局、估值分析、投资结论、投资亮点、风险分析)
Financial line items (营业收入、归母净利润、扣非净利润、毛利率、净利率、研发费用、经营现金流)
Investment recommendations (增持 / 中性 / 减持 / 买入 / 卖出 / 积极关注)
Market / product terms (智算中心、AI加速、国产替代、CUDA兼容、实体清单、港交所递表)

If the term you need is not in the lexicon, add it to the lexicon first (one commit), then reference it. Don't type Chinese from memory in a \uXXXX form. See references/cn-lexicon.js.

Rule 3 — Cover page English primary, Chinese secondary (text safety shield)

Cover slides MUST use the English company name as 44pt hero title and Chinese name as ≤28pt subtitle. English ASCII has zero escape-typo risk; the Chinese subtitle is shorter and easier to sanity-check.

slide.addText("Cambricon Technologies", { fontSize: 44, fontFace: "Georgia", bold: true });       // 大字英文
slide.addText("寒武纪科技深度分析", { fontSize: 26, fontFace: "Microsoft YaHei" });                   // 小字中文

Rule 4 — Data source hierarchy (data accuracy)

中国公司数据 MUST come from this tier, in order. Only drop to lower tier if higher tier returns empty / 402 / 403:

Tier	Source	MCP tool / method
T1 — 交易所/监管披露	巨潮资讯 (cninfo.com.cn) 招股书 / 年报 / 季报 PDF 原文	`web_fetch` on cninfo URL, parse PDF text
T1 — 盘面数据	Tushare Pro	`aigroup-market-mcp__company_performance`, `aigroup-market-mcp__stock_data`, `aigroup-market-mcp__basic_info`
T2 — 公司公告	上交所 / 深交所 / 港交所官网	`web_fetch` on sse/szse/hkex
T2 — 工商信息	天眼查 / 企查查（如 MCP 已装）/ 国家企业信用公示系统	`aigroup-tianyancha-mcp` 或 `web_search`
T3 — 第三方数据	Wind / 同花顺 / 东方财富 / FMP / Finnhub（港股/中概股）	`aigroup-fmp-mcp`, `aigroup-finnhub-mcp`
T4 — 公开报道	财新 / 21世纪 / 中新社 / 澎湃 / 财联社	`brave-web-search`, `web_fetch`

Rule 5 — Cross-check every hard number + provenance gate (data accuracy, MANDATORY)

Every financial number in the deliverable (营业收入 / 净利润 / 毛利率 / 市值 / 股价 / 融资金额) must be verified by at least 2 independent sources from the tier table above, OR clearly flagged as "single-source estimate" with the source cited in a page footer caption. If 2 sources diverge by > 5%, report both and pick the more recent; add a footnote.

Mandatory Phase 5 QA gate — every deliverable must pass provenance_verify.py before being considered shippable. The script scans the analysis markdown for hard numbers (digit + 亿/万/%/RMB/USD/元/CNY/HKD/M/B) and confirms every one of them has a matching row in the companion data-provenance.md tracking table. Missing provenance → exit 1 → block delivery.

python3 skills/cn-client-investigation/scripts/provenance_verify.py \
    deliverable/analysis.md \
    deliverable/data-provenance.md

Before shipping, also run in --strict mode. Strict adds two additional checks on top of the baseline substring match:

Estimate-as-T1 smuggling — if a hard number in analysis.md is adjacent to an estimate marker (~, 约, 大约, 估算, approximately, est., 粗估, 推算), the matching data-provenance.md row's source column MUST include at least one derivation keyword: [ESTIMATED] / [DERIVED] / 估算 / 推算 / derived / computed / analyst estimate. Otherwise the gate FAILs with the offending line number. This catches the common pattern where the agent writes ~22.9% but marks the provenance row with a T1 source like "Tushare Pro income_all" — which looked fine under the baseline substring check but is semantically misleading (the T1 source did not return 22.9%; the agent derived it).
Precision drift (WARN, non-blocking) — when the same rounded integer + unit appears with multiple precisions in the analysis (e.g. 1.34 元/股 vs 1.340 元/股 vs 1.3 元/股), the gate emits a WARN so the agent can decide whether the differing precisions are intentional (e.g. pre vs post restatement with footnote) or a typo.

python3 skills/cn-client-investigation/scripts/provenance_verify.py --strict \
    deliverable/analysis.md \
    deliverable/data-provenance.md

Non-strict mode behavior is unchanged — --strict is additive and opt-in.

Every banker deliverable MUST include a data-provenance.md file at the deliverable root. Use the template under references/data-sources.md as the starting shape. Fill in one row per hard number with: 指标 / 数值 / 单位 / 期间 / Tier / 源 / URL 或工具 / 取数时间 / 交叉验证状态.

Rule 6 — No fabrication on missing data (data accuracy)

If a needed data point cannot be fetched (MCP returns error, web blocked, document inaccessible):

DO NOT invent a plausible-looking number
DO label the cell / chart / page-section as "数据不可得" / "N/A (source unavailable)" with a footnote explaining the attempted source
DO proceed with the rest of the analysis — missing data doesn't block the deck

Historical micro_probit / panel_var_model style illustrative data is NOT appropriate for China banker deliverables — those tools belong to the lab bundle and produce demonstration output only.

Rule 7 — Self-verify deck text before delivery (typo detection, MANDATORY GATE)

Typo detection is NOT an optional step — it is a compile-time gate. A pptx that has not passed cn_typo_scan.py is NOT a shippable deliverable.

The canonical way to enforce this is to base slides/compile.js on the provided template:

references/compile_with_typo_gate.template.js.txt

Copy it to the deliverable's slides/compile.js (rename the .txt suffix off), adjust SLIDE_COUNT / OUTPUT_PATH / THEME at the top, then cd slides && node compile.js.

Why the .txt suffix in the plugin bundle: the template contains a child_process.spawnSync call to invoke the Python scanner, which OpenClaw's install-time safety scanner flags as a dangerous runtime pattern. Keeping the template as .js.txt under references/ tells the scanner this is documentation, not executable plugin code. At use time, you always copy it into your own deliverable's slides/ directory and strip the .txt — at that point it is your own script, outside the plugin trust boundary. The template:

Standard pptxgenjs compile loop (require slide-01.js … slide-NN.js, call createSlide(pres, theme), writeFile)
Spawn python3 with the skill's cn_typo_scan.py against the newly-written pptx's extracted text
If scan exit is non-zero, node process.exit(1) — the pptx is NOT considered delivered until the offending slide-NN.js files are fixed and the compile is re-run

If you cannot use the template verbatim (e.g. custom compile pipeline), you MUST still run the equivalent gate after every writeFile:

python3 -c "from pptx import Presentation; p = Presentation('deck.pptx'); [print(para.text) for s in p.slides for sh in s.shapes if sh.has_text_frame for para in sh.text_frame.paragraphs if para.text.strip()]" > /tmp/deck.txt
python3 skills/cn-client-investigation/scripts/cn_typo_scan.py /tmp/deck.txt  # exit 0 = ship, exit 1 = abort

cn_typo_scan.py greps for these red-flag patterns (all observed in 2026-04-18 runs or confirmed on the broader \uXXXX token-drift pattern space):

Rare character dyads that shouldn't appear in banker prose: 宽厭 / 谛数字 / 洁利 / 贜 / 校虚 / 催化济 / 棒品 / 转映 / 艺瑞 / 调诚
Chinese chars immediately followed by digits (classic escape truncation symptom): [一-龥][0-9]
CJK Extension A / B / C / D characters (U+3400-U+4DBF, U+20000+) — almost always corruption in banker prose

On scan hit:

Read the stderr report — each line gives L<n>, reason, and context snippet
Identify the source slide-NN.js file containing the offending text
Replace the broken Unicode string with the UTF-8 literal fix (preferably via LEXICON.red_zone.<key> lookup from references/cn-lexicon.js — Rule 2)
Re-run node slides/compile.js — the gate will rescan

Do NOT ship a pptx that has bypassed the gate. Do NOT --no-typo-scan your way out of failures.

Workflow

Phase 1 — Scope + lexicon load

Confirm target: A-share / STAR / ChiNext / 北交所 / 港股 / 中概股 / 非上市独角兽 —— 决定数据源 tier
Load / update references/cn-lexicon.js:
- Target company name (中/英/ticker)
- Top-5 peers (中/英/ticker)
- Industry specific terms（AI 芯片 / 新能源车 / 创新药 / SaaS）
Decide regulator context (证监会 / 香港证监会 / SEC for US-listed ADR)

Phase 2 — Data collection (按 tier 依次 try，记录 source)

For each required data element (营收/利润/股价/估值/股权结构/管理层/业务线/竞争/风险)：

Call T1 MCP (Tushare / 巨潮 fetch). Record raw output.
If T1 failed or incomplete, call T2 (交易所官网 / 天眼查). Record.
Cross-check: pick any hard number from T1 vs T2 vs T3 — require ≥ 2 agreeing sources or flag.
Record source list in references/data-provenance.md (update per company) with URL + retrieval timestamp.

Phase 3 — Analysis synthesis (投行传统维度)

Follow the banker-classical analysis frame (customer-analysis-pack skill), enhanced with:

CN-specific 股权结构 section: 实控人 / 国资 / 员工持股 / 战略投资人 / 解禁时间表
CN-specific 政策驱动 section: "十四五" / 新基建 / 专精特新 / 国产化替代进度
CN-specific 监管风险 section: 证监会处罚历史 / 关联交易披露 / ESG 新规

Phase 3.5 — Raw-data snapshot (MANDATORY from v0.9.0)

Why: 2026-04-20 多公司 real-test 发现 MiniMax 在 provenance 的 Source 列里写"Wind (2026-04-17)"、"同花顺 F10"这种并未安装的工具名称作为来源 —— 纯捏造。要堵这个洞，agent 在写 analysis.md 之前必须把真实的 MCP 工具调用结果存成 JSON 快照，作为审计尾迹。

三个 CN MCP（插件 .mcp.json 已声明依赖）：

MCP	覆盖	关键工具
`aigroup-market-mcp`	上市公司行情 + 财务 (Tushare)	`basic_info` / `company_performance` / `stock_data` / `index_data` / `finance_news`
`PrimeMatrixData`	上市 + 非上市企业工商 + 司法 + 风险 (启信宝)	`basic_info` / `judicial_info` / `risk_info` / `shareholder_info` / `finance_info`
`Tianyancha`	上市 + 非上市企业基础 + 风险全景 (天眼查)	`companyBaseInfo` / `risk`

要求：在 <deliverable-dir>/raw-data/ 目录下保存每次 MCP 调用的原始 JSON，文件名格式 {identifier}-{mcp-short}-{tool}.json。

上市公司（有 ts_code，如 002594.SZ / 300750.SZ / 0700.HK）必须包含：

{ts_code}-aigroup-market-mcp-basic_info.json
{ts_code}-aigroup-market-mcp-company_performance.json
{ts_code}-aigroup-market-mcp-stock_data.json
≥1 企业风险 overlay：{uscc}-primematrix-basic_info.json（primary）或 {uscc}-tianyancha-companyBaseInfo.json（备用，见下）

非上市公司（只有统一社会信用代码 / uscc）必须包含：

≥1 企业风险 overlay：{uscc}-primematrix-basic_info.json（primary）或 {uscc}-tianyancha-companyBaseInfo.json（备用）（aigroup-market-mcp 不适用，可省略）

强制前置步骤 — 公司名称核验（v0.9.2+）：非上市公司调 PrimeMatrixData__basic_info 之前，必须先调 PrimeMatrixData__company_name 模糊查出精确注册名。常见的公众名 ≠ 法定名陷阱：

"字节跳动" 实际内地主体已于 2023 年改名为 抖音有限公司
"京东数科" 后改名 京东科技控股股份有限公司
"滴滴" 的大陆注册主体是 北京小桔科技有限公司

用公众名硬调 basic_info，PrimeMatrix 返回 {} 空对象，下游数据全部空白。raw_data_check.py 现在会检测"PM basic_info 无统一社会信用代码"并 FAIL。正确姿态：

step 1: PrimeMatrixData__company_name(blur_name="字节跳动")  →  列出匹配实体
step 2: 人工 / agent 选定法定名  →  "抖音有限公司"
step 3: PrimeMatrixData__basic_info(company_name="抖音有限公司")  →  完整工商信息

risk_info 空返回警觉：PM risk_info 若只返回 {"公司名称": "..."} 而无 司法/经营异常/关联风险 等字段，不等于"企业干净"——可能是 PM API 对该实体无数据返回。banker 交付前需手工再核一次司法公告/行政处罚/失信被执行人库，不能靠 gate 反向证明。

Tianyancha 当前状态（2026-04+）：智谱 MCP broker 的 Tianyancha 账户暂停（余额耗尽）。gate 接受已有 snapshot 但不强制 — PrimeMatrixData 目前是唯一实际可达的企业风险 overlay。需要启用 Tianyancha 时充值 + 按 lead-discovery QUICKSTART 注册即可。

data-provenance.md 要求：每个 raw-data/*.json 文件的文件名 stem 必须在 data-provenance.md 的 Source 列至少出现一次 —— 这建立了"MD 里的数字 ↔ 溯源表 ↔ 原始 MCP 调用"的闭环。

Worked example (BYD 002594.SZ)：

deliverables/byd-20260420/
├── raw-data/
│   ├── 002594.SZ-aigroup-market-mcp-basic_info.json      ← basic_info 返回：公司简介 + 行业 + 上市日期
│   ├── 002594.SZ-aigroup-market-mcp-company_performance.json  ← 营收 / 净利润 / 毛利率 / ROE 时间序列
│   ├── 002594.SZ-aigroup-market-mcp-stock_data.json       ← 近 1 年日线 OHLC + 复权
│   └── 91440300192317458F-tianyancha-companyBaseInfo.json  ← 工商基本信息 + 统一社会信用代码
├── data-provenance.md   （每一行 Source 列写 `aigroup-market-mcp__company_performance` 或对应 raw-data 文件名 stem）
├── analysis.md
├── slides/ ...
└── 比亚迪_deep_analysis.pptx

验证：raw_data_check.py（在 validate-delivery.py aggregator 的第 3c 道 gate 自动跑）会确认：

raw-data/ 目录存在且至少有 3 个 JSON 文件
上市 vs 非上市的工具覆盖满足上述要求
每个 raw JSON 文件 stem 都在 provenance 里被引用

Back-compat：0.8.x 版本的旧交付物没有 raw-data/ → 默认模式下 gate 3c 给出 WARN 但不 FAIL；--strict-mcp 模式下直接 FAIL。新交付物必须上 raw-data/。

Phase 4 — Deliverable generation (banker-memo preferred, v0.9.6+)

PREFERRED ROUTE (0.9.6+): Prompt-driven banker-memo skill + build_outline_deck.py.

The banker-memo skill dispatches the MiniMax agent through an investment-banker-analyst framework (8-section research memo + content-driven 10-15 slide outline, no fixed page count). Usage:

# 1. After raw-data/ is populated (Phase 3.5), dispatch banker-memo skill
python3 scripts/banker-memo/scripts/build_banker_prompt.py \
    <ts_code> <name_cn> <industry> <raw_dir> <out_dir> > /tmp/prompt.md
openclaw agent --agent main --thinking high --json --timeout 600 \
    --message "$(cat /tmp/prompt.md)"
# Agent writes analysis.md + slides-outline.md + data-provenance.md

# 2. Compile outline-driven PPT
python3 scripts/cn-client-investigation/scripts/build_outline_deck.py \
    <dir> <ts_code> <name_cn> <name_en>

# 3. Close provenance gaps + validate
python3 scripts/cn-client-investigation/scripts/sync_provenance.py <dir>
python3 scripts/cn-client-investigation/scripts/validate-delivery.py --strict-mcp <dir>

Why preferred: 0.9.5 Python-templated build_deck.py produced an 8-slide data dashboard — no industry context, no peer benchmarking, no SOTP / 4C's. 0.9.6 prompt-driven path: agent writes banker narrative (14-20 KB analysis) with peer comparison ([EST] tagged), Data Flag self-reporting (e.g. income vs company_performance 0.59pp discrepancy), SOTP valuation scenarios, 4C's credit framework with specific 授信额度 / 期限 / 利率 / 增信 recommendations.

LEGACY ROUTE (still supported for quick fact-sheets): build_deck.py emits a fixed 8-slide stat-card dashboard. Use only when depth isn't required.

Both routes share: pptxgenjs slide compile + cn_typo_scan post-write gate. NEVER use python-pptx for generation (white background / no theming; only permitted for text extraction inside validate-delivery.py).

For CN targets, these routes override ppt-deliverable's "MiniMax first" routing — only pptxgenjs compile integrates the compile-time typo gate.

Steps:

Write slides/slide-01.js … slide-NN.js — each exports createSlide(pres, theme).
Copy references/compile_with_typo_gate.template.js.txt → slides/compile.js (strip .txt), set SLIDE_COUNT / OUTPUT_PATH / THEME.
cd slides && node compile.js — the template's post-write gate runs cn_typo_scan.py; if it fails, fix the offending slide-NN.js and recompile.
slides/slide-01.js cover uses Rule 3 (English hero 44pt + Chinese subtitle ≤28pt).
Every addText with Chinese content uses the lexicon (require('./references/cn-lexicon.js')); do NOT inline-type long Chinese strings.

Phase 5 — QA (用 `validate-delivery.py` 单入口)

推荐一条命令跑完六道 gate：

python3 ~/.openclaw/extensions/aigroup-financial-services-openclaw/skills/cn-client-investigation/scripts/validate-delivery.py \
    --strict --strict-mcp --style \
    /path/to/deliverable_dir
# exit 0 → 全部 PASS；--strict-mcp 下还要求 source 列有 MCP-tool / 官方披露 anchor 且 raw-data/ 齐全
# exit 1 → 至少一道 gate 失败，stderr 指出是哪道 + 具体行号 / 原因

Aggregator 自动按文件名 find 并调度：

*intelligence*.md → verify_intelligence.py（跨插件引用 lead-discovery 的 cn-lead-safety）
*.pptx → 自动 extract text + cn_typo_scan.py
*.pptx + data-provenance.md → slide_data_audit.py（v0.6.0 起：PPT 上每个硬数字必须有 provenance 行支撑；捕获 Phase 4 手改 slide-NN.js 后 provenance 未同步更新的漂移）
analysis.md + data-provenance.md → provenance_verify.py（--strict 开启 estimate-as-T1 smuggling 检测 + 精度漂移 WARN）
data-provenance.md → source_authenticity_check.py (NEW v0.9.0：扫 Source 列，拦 Wind / 同花顺等未安装工具的伪造标注；--strict-mcp 下非 MCP / 非官方披露全部 FAIL)
<dir>/raw-data/ → raw_data_check.py (NEW v0.9.0：校验 Phase 3.5 的 MCP 调用快照 + 上市 vs 非上市覆盖 + provenance 引用闭环；--strict-mcp 下缺 raw-data/ 直接 FAIL)
可选 --style → style_scan.py --warn-only 对 analysis.md 扫货币/期间/日期/YoY 术语一致性（非阻塞）

Debug 单 gate 时仍可分别跑（见各自脚本）。额外手动 QA：

Sensitive items（未披露财务细节 / 估值倍数）must 标 "估算" / "illustrative" with caption if not from T1-T2 source.
Final deck 加 "已知缺口 / 数据置信" 汇总 section 到 appendix.

强烈建议：validate-delivery PASS 之后，交付客户之前，再跑一次 data-quality-audit skill（在 paired plugin aigroup-lead-discovery-openclaw 中）做独立交叉源验证。Layer 1 的三道 upstream gate 只能校验交付形式合规性（硬数字有溯源 / 不含 typo / estimate 标注规范），但数据语义正确性（pre vs post restatement / ROE 口径 / 股价基准）需要独立二源拉取数据来打分 —— 这是 data-quality-audit 的职责。validate-delivery 在 overall PASS 时会自动在 stdout 尾部打印 "Next step: run data-quality-audit skill …" 提示。

海天味业 2026-04-19 的 audit 就是一例：三道 upstream gate 全绿，但 data-quality-audit 抓到 EPS 2022 pre/post-restatement FAIL（1.34 vs 1.11 差 20.7%）。现在 --strict 模式 + restatement_aware 规则已把这一类问题部分上移到 Layer 1，但完整的 pre/post 版本识别仍需 Layer 3 独立拉数对比。

Output standard

MD 底稿 + PPTX 交付
references/data-provenance.md listing every number source
cn_typo_scan.py output attached as QA evidence
Absolute paths in final report

Integration with existing skills

This skill supersedes the generic customer-investigation + customer-analysis-pack flow when target is a China entity. The banker analysis frame (datapack-builder 8-tab structure, dcf-model WACC methodology, pitch-deck slide conventions) still apply —— this skill adds the text-safety and CN-specific data-source layer on top of them.

For non-CN targets (US / EU / JP / KR / IN / SE-Asia / LATAM), use the generic skills without this overlay.

Not in scope

Non-Chinese company analysis (use generic banker skills)
Experimental econometric validation with aigroup-econ-mcp (lab bundle only; not appropriate for banker client deliverables)
Embedded markdown→pptx via aigroup-mdtopptx-mcp (lab bundle only)

Comments

Loading comments...