国家统计局数据采集

v1.0.0

国家统计局数据采集技能。当用户需要采集国家统计局（NBS）的宏观经济数据（如GDP、CPI、PPI、产出缺口等）时触发。适用场景包括：（1）采集GDP、CPI、PPI等指标；（2）从国家统计局官网/统计年鉴获取数据；（3）计算产出缺口（HP滤波）；（4）整理数据到Excel。本skill包含标准工作流程、数据源U...

⭐ 0· 96·1 current·1 all-time

byJackie Zhang@yingjie-zhang-dev

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for yingjie-zhang-dev/nbs-data-collection.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "国家统计局数据采集" (yingjie-zhang-dev/nbs-data-collection) from ClawHub.
Skill page: https://clawhub.ai/yingjie-zhang-dev/nbs-data-collection
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install nbs-data-collection

ClawHub CLI

Package manager switcher

npx clawhub@latest install nbs-data-collection

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

Name/description match the code and instructions: scripts and SKILL.md focus on downloading NBS data, computing HP filter, adding charts, verifying links and producing Excel. Network endpoints referenced are the expected NBS domains (stats.gov.cn, data.stats.gov.cn) and examples (Google search). No unrelated cloud credentials, exotic binaries or unrelated services are requested.

ℹ

Instruction Scope

Instructions remain within the stated purpose (planning → crawl → process → verify → output). However, the SKILL.md and README do not fully declare runtime needs: scripts call curl (via subprocess) and require Python packages (openpyxl, numpy). verify_links.py will extract hyperlinks from a provided Excel file and run curl against them — this behavior is expected for link-checking but means the skill will attempt network connections for any URL present in the spreadsheet (including internal or private URLs).

✓

Install Mechanism

This is an instruction-only skill with no install spec. No third-party archives or arbitrary download+extract steps are used. The risk here is operational: the code expects external Python packages and curl to be available on the host, but the skill does not install them itself.

Credentials

The skill requests no environment variables or credentials (good). But verify_links.py will execute curl on every URL it finds in an Excel file; if a user-supplied spreadsheet contains internal network URLs or attacker-controlled endpoints, the script will probe them from the host. That can be used to scan internal services or cause unwanted outbound requests. Also some files contain placeholder/TODO behavior (nbs_crawler returns gdp/cpi fields as None) — not a credential issue but affects usefulness.

✓

Persistence & Privilege

Skill does not request persistent/high privileges or always:true. It writes output/checkpoint files to a local output/ path (normal). It does not modify other skills or system-wide configuration.

What to consider before installing

This skill is generally coherent with its stated purpose, but review and take precautions before running: - Inspect any Excel files you provide: verify_links.py will extract hyperlinks and run curl against each URL. Do not give it spreadsheets containing internal/private URLs (this could cause unwanted internal network probes). - Run the scripts in an isolated environment (sandbox, VM, or container) if you are unsure about network/host exposure. - Ensure required runtime dependencies are present: Python 3.10+, openpyxl, numpy (and optionally scipy if you add it), and the curl binary. The skill does not install these automatically. - Note some crawler functions are placeholders (TODO) and return null values; verify that nbs_crawler actually retrieves data for your target timeframe before relying on outputs. - If you need stricter behavior, consider modifying verify_links.py to whitelist domains (e.g., stats.gov.cn, data.stats.gov.cn) rather than checking all links, or add an option to skip network checks. If you want, I can: (1) list exact commands to create an isolated environment and install dependencies, (2) show the minimal code change to make verify_links.py domain-whitelist-only, or (3) walk through the crawler functions and point out where to implement real API calls.

Like a lobster shell, security has layers — review code before you run it.

chinavk97e420232bqcmbk51cgwn6fzx83xz07datavk97e420232bqcmbk51cgwn6fzx83xz07economyvk97e420232bqcmbk51cgwn6fzx83xz07gdpvk97e420232bqcmbk51cgwn6fzx83xz07latestvk97e420232bqcmbk51cgwn6fzx83xz07nbsvk97e420232bqcmbk51cgwn6fzx83xz07

96downloads

0stars

1versions

Updated 4w ago

v1.0.0

MIT-0

国家统计局数据采集 Skill

快速开始

当用户要求采集国家统计局数据时，按以下流程执行：

用户请求 → 计划阶段 → 数据采集 → 数据处理 → 质量核验 → 输出交付

工作流程（6阶段）

阶段1：计划阶段（吏部）

明确采集指标和数据范围
确认数据频率（季度/月度/年度）
评估数据可获取性（2015年前数据可能有特殊限制）
规划采集顺序

阶段2：数据采集（户部）

从国家统计局data.stats.gov.cn或统计公报获取原始数据
保存到checkpoint文件（支持断点续采）
标注数据来源和采集时间

阶段3：数据处理（工部）

计算衍生指标：
- 人均GDP = GDP ÷ 人口
- GDP平减指数 ≈ CPI链式指数
- 产出缺口 = HP滤波(λ=1600)
季度均值计算（月度→季度）
同比/环比计算

阶段4：质量核验（刑部）

数据抽查：从统计公报/年鉴核实2-3个关键数据点
计算验证：确认衍生指标计算正确
差异处理：如发现差异，记录并说明

阶段5：链接与可视化

添加数据来源链接（见下方"数据源URL"）
绘制折线图：时间序列数据用折线图展示趋势
验证链接有效性：批量检查所有链接

阶段6：输出交付

Excel文件（多Sheet，每个指标一个Sheet）
数据说明Sheet（标注计算方法、数据来源）
checkpoint文件备份

关键指标说明

指标	频率	来源	计算说明
实际GDP	季度	国家统计局API/统计年鉴	已有，标注"初步核算"/"最终核实"
GDP增速	季度	计算或年鉴	同比增速
人均GDP	季度	计算	= GDP ÷ 人口(季度估算)
CPI	月度→季度均值	国家统计局API	上年同月=100
PPI	月度→季度均值	国家统计局API	上年同月=100
GDP平减指数	年度	计算	≈ CPI季度链式指数（标注⚠️估算）
产出缺口	季度	HP滤波	λ=1600趋势成分

HP滤波参数

λ = 1600  # 季度标准值
# 年度 λ=100，月度 λ=14400

数据源URL

统计年鉴（重要！URL格式不统一）

数据年份	年鉴URL格式	说明
2003年数据	`.../ndsj/yb2004-c/indexch.htm`	⚠️ 特殊格式
其他年份	`.../ndsj/{yearbook_year}/indexch.htm`	数据年+1=年鉴年份

规则：数据年份N → 年鉴年份N+1

2003年数据 → 2004年鉴（URL: .../ndsj/yb2004-c/indexch.htm）
2024年数据 → 2025年鉴（URL: .../ndsj/2025/indexch.htm）

统计公报（用于2025年等最新数据）

季度	URL
2025Q1	`https://www.stats.gov.cn/sj/zxfb/202504/t20250417_1959334.html`
2025Q2	`https://www.stats.gov.cn/sj/zxfb/202507/t20250716_1960426.html`
2025Q3	`https://www.stats.gov.cn/sj/zxfb/202510/t20251021_1961646.html`
2025Q4	`https://www.stats.gov.cn/sj/zxfb/202601/t20260120_1962349.html`

data.stats.gov.cn

主页: https://data.stats.gov.cn
⚠️ 旧API /easyquery.htm 已返回404，需使用新API

输出文件结构

output/
├── 国民经济核算与价格指数_YYYY-MM-DD.xlsx
│   ├── 实际GDP (季度)
│   ├── GDP增速
│   ├── 人均GDP
│   ├── 产出缺口
│   ├── CPI (季度均值)
│   ├── PPI (季度均值)
│   ├── GDP平减指数
│   └── 数据说明
├── checkpoint_gdp.csv
├── checkpoint_hp_filter.csv
└── raw_data/
    ├── gdp_quarterly.json
    └── cpi_monthly.json

复用脚本

scripts/nbs_crawler.py - 数据采集主脚本
scripts/verify_links.py - 批量验证链接有效性
scripts/add_charts.py - 为Excel添加折线图
scripts/hp_filter.py - HP滤波计算

注意事项

2015年界限：2015年前数据可能需从年鉴OCR获取
数据修订：2025年等最新数据可能与初步核算有差异，应使用最终核实数
SSL问题：部分年份年鉴访问可能间歇性404
计算项标注：所有估算/计算项必须标注⚠️和计算方法

参考文档

工作流程详解: references/workflow.md
指标说明: references/indicators.md
数据源URL: references/sources.md

Comments

Loading comments...