Cn Financial Notes Extraction

v1.0.0

提取中国 A 股上市公司年报 (PDF) 中的财务报表附注明细。适用于获取主表无法体现的深层数据(如 CapEx 明细、研发费用细分、应收账款账龄、关联方交易等)。

0· 67·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for cgxxxxxxxxxxxx/cn-financial-notes-extraction.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Cn Financial Notes Extraction" (cgxxxxxxxxxxxx/cn-financial-notes-extraction) from ClawHub.
Skill page: https://clawhub.ai/cgxxxxxxxxxxxx/cn-financial-notes-extraction
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install cn-financial-notes-extraction

ClawHub CLI

Package manager switcher

npx clawhub@latest install cn-financial-notes-extraction
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (extract financial-note tables from A-share annual report PDFs) align with the workflow and code snippet: locating the '财务报表附注' section, using pdfplumber to extract tables, filtering and structuring results. Referencing CNInfo/East Money as PDF sources is consistent with the stated purpose.
Instruction Scope
SKILL.md stays focused on PDF download and parsing. The only notable instruction beyond pure parsing is the advisory: 'don't run in subagent (sandbox) unless pdfplumber is installed in it'—this steers execution toward the host environment. That is not inconsistent with the purpose but does broaden the agent's operational surface (runs code against host filesystem/environment). The instructions do not ask for unrelated system files, credentials, or data exfiltration.
Install Mechanism
No install spec and no code files — instruction-only. This is low-risk from an installer perspective (nothing will be downloaded/installed automatically by the skill itself).
Credentials
The skill requests use of the host's python3 and pdfplumber (and optionally OCR tools like Tesseract/MinerU). It requests no environment variables or credentials. Using host Python is reasonable for PDF processing, but advising against sandbox use increases privilege required at runtime; that should be considered when granting execution rights.
Persistence & Privilege
Skill is user-invocable, not always-enabled, and does not request persistent system-wide privileges or modify other skills. It does not request autonomous always-on presence.
Assessment
This skill is coherent with its description and doesn't request secrets, but it tells the agent to run on the host Python environment rather than in a sandbox. Before installing or running: (1) prefer executing the extraction in an isolated environment (container or VM) to limit risk from malicious or malformed PDFs and third-party OCR binaries; (2) review and test the exact Python code you run and only install pdfplumber / Tesseract from trusted sources; (3) avoid giving the agent broad host filesystem or network access—limit it to the directories and network endpoints needed to fetch known PDF sources; (4) if you cannot run in isolation, treat the 'run on host' instruction as a reason to be cautious. If you want higher assurance, ask the skill author for a signed/reviewable code implementation or run the provided extraction logic yourself in a controlled environment.

Like a lobster shell, security has layers — review code before you run it.

latestvk971hgs610440kxk3ajpp80c5h850q85
67downloads
0stars
1versions
Updated 1w ago
v1.0.0
MIT-0

核心能力

从巨潮资讯 (CNINFO) 下载的年报 PDF 中,精准定位并提取财务报表附注中的表格数据。

适用场景

  • CapEx 分析: 提取“在建工程”、“固定资产”、“无形资产”等附注中的本期增加/减少明细(MD&A 口径 vs 现金流表口径)。
  • 风险排查: 提取应收账款账龄、坏账准备计提比例、商誉减值明细。
  • 关联交易: 提取关联方往来余额、购销金额。
  • 研发细分: 提取研发费用资本化/费用化明细。

工作流程 (Workflow)

  1. 下载: 通过 CNInfo API ScraperEast Money Announcement Downloader 下载最新年报 PDF。
  2. 定位: 使用 pdfplumber (系统 Python 环境) 打开 PDF,全文检索关键字 财务报表附注
  3. 提取:
    • 从定位页开始,逐页扫描。
    • 提取表格 (extract_tables())。
    • 智能过滤: 根据表头关键词(如 项目, 期末余额, 本期增加, 本期减少, 账面余额)筛选有效表格。
    • 忽略纯文本页或无意义的排版表。
  4. 结构化: 将提取的数据转换为 Dict[List]DataFrame 格式输出。

关键参数与代码逻辑

import pdfplumber

def extract_notes(pdf_path, keywords=None):
    found_data = []
    with pdfplumber.open(pdf_path) as pdf:
        # 1. 定位附注起始页
        start_idx = 0
        for i, page in enumerate(pdf.pages):
            text = page.extract_text()
            if text and "财务报表附注" in text:
                start_idx = i
                break
        
        # 2. 扫描表格
        for i in range(start_idx, len(pdf.pages)):
            page = pdf.pages[i]
            tables = page.extract_tables()
            for table in tables:
                # 过滤空表或短表
                if len(table) > 3 and any(row[0] for row in table if row):
                    # 可选:如果指定了 keywords,检查表头是否匹配
                    if keywords:
                        headers = " ".join([str(c) for c in table[0] if c])
                        if any(kw in headers for kw in keywords):
                            found_data.append({"page": i+1, "table": table})
                    else:
                        found_data.append({"page": i+1, "table": table})
    return found_data

注意事项

  • 环境依赖: 使用宿主机的 python3pdfplumber,不要在沙箱 (subagent) 中直接运行,除非确认安装了库。
  • 表格合并: 跨页表格可能被拆分成两个,需逻辑合并(通过检查表头连续性)。
  • 非标准排版: 极少数老旧年报可能是扫描版,需 OCR(如 MinerU 或 Tesseract),但目前 A 股年报大多为原生 PDF,pdfplumber 效果最佳。

Comments

Loading comments...