Fund Report Extractor

v1.0.0

自动提取公募基金定期报告中“投资策略和运作分析”部分全文,支持文本型和扫描版PDF的精准定位与汇总。

0· 379·1 current·1 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for suncxw-creator/fund-report-extractor.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Fund Report Extractor" (suncxw-creator/fund-report-extractor) from ClawHub.
Skill page: https://clawhub.ai/suncxw-creator/fund-report-extractor
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install suncxw-creator/fund-report-extractor

ClawHub CLI

Package manager switcher

npx clawhub@latest install fund-report-extractor
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (extract fund report 'investment strategy' sections) align with the code and SKILL.md. Required libraries (akshare, pdf parsers, requests) are appropriate for scraping and parsing PDFs; no unrelated credentials or binaries are requested.
Instruction Scope
SKILL.md and extract.py confine actions to: fetching announcement lists via akshare, constructing PDF URLs on pdf.dfcfw.com, downloading PDFs, extracting text with pdfplumber or PyMuPDF, and writing text files locally. There are no instructions to read unrelated files, access other credentials, or exfiltrate data to unexpected endpoints.
Install Mechanism
There is no install spec (instruction-only + a single Python script). Dependencies are standard Python packages from PyPI; no remote archives or obscure installers are downloaded by the skill itself.
Credentials
No environment variables, secrets, or config paths are requested. The skill only needs network access for public data and permission to write files in the working directory — both are reasonable for this task.
Persistence & Privilege
The skill does not request permanent/always-on inclusion and does not modify other skills or system-wide settings. It simply writes output files to its working directory.
Assessment
This skill appears to do what it says: download public fund PDFs and extract the 'investment strategy and operations analysis' sections. Before running, consider: 1) The package source/homepage is unknown — review the extract.py source (you have it) and ensure it matches your expectations. 2) Run in an isolated environment (virtualenv or container) to limit side effects. 3) Dependencies (akshare, pymupdf, pdfplumber, requests, pandas) will be installed from PyPI; pin versions if you care about supply-chain risk. 4) The script downloads PDFs from pdf.dfcfw.com and writes files to the current directory; ensure you are comfortable with network access and disk writes. 5) There are no credentials requested and no obvious exfiltration, but if you plan to run this inside sensitive environments, validate akshare's network behavior and avoid exposing secrets. If you want higher assurance, run it on a small test fund code and inspect the downloaded PDFs and outputs first.

Like a lobster shell, security has layers — review code before you run it.

latestvk9731en5c1yvgpa05575t13bvx82gfr9
379downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

基金定期报告投资策略提取Skill

功能

自动提取公募基金定期报告中"投资策略和运作分析"部分的全文。

适用场景

  • 提取基金经理管理的基金定期报告
  • 需要获取"报告期内基金的投资策略和运作分析"原文
  • 按时间正序整理汇总

使用方法

1. 获取基金代码

如果不知道基金代码,需要先搜索:

  • 基金名称 → 天天基金网搜索
  • 或使用AKShare搜索

2. 运行提取脚本

python 基金报告提取.py --code 基金代码 --name "基金名称"

技术要点

1. 数据获取:AKShare

import akshare as ak
df = ak.fund_announcement_report_em(symbol='基金代码')
  • 获取基金全部历史公告列表
  • 包含公告ID,可构建PDF下载链接

2. PDF下载链接格式

http://pdf.dfcfw.com/pdf/H2_{报告ID}_1.pdf

3. PDF解析方案

方案A:PyMuPDF(文本型PDF)

import fitz
import re

doc = fitz.open(stream=pdf_content, filetype='pdf')
full_text = ''
for page in doc:
    html = page.get_text('html')
    # 提取Unicode中文
    codes = re.findall(r'&#x([0-9a-fA-F]+);', html)
    for c in codes:
        full_text += chr(int(c, 16))

方案B:pdfplumber(扫描版PDF)

import pdfplumber

with pdfplumber.open(pdf_file) as pdf:
    all_text = ''
    for page in pdf.pages:
        text = page.extract_text()
        if text:
            all_text += text + '\n'

4. 关键词定位

不同基金公司/报告类型关键词位置不同:

文本型PDF(景顺长城):

  • "报告期内基金的投资策略和运作分析"
  • "管理人对报告期内基金的投资策略和业绩表现的说明"
  • "管理人对宏观经济、证券市场及行业走势的简要展望"

扫描版PDF(中泰星元):

  • "4.4 报告期内基金的投资策略和运作分析"
  • 内容通常在Page 7-9
  • 需要逐页搜索关键词

5. 内容提取模板

# 提取投资策略部分
if '报告期内基金的投资策略和运作分析' in full_text:
    idx1 = full_text.find('报告期内基金的投资策略和运作分析')
    idx2 = full_text.find('报告期内基金的业绩表现', idx1)
    if idx2 == -1:
        idx2 = idx1 + 2500
    content = full_text[idx1:idx2]

常见问题

Q: PDF是扫描版文字提取不到?

A: 使用pdfplumber替代PyMuPDF,并精确定位Page 7/8/9

Q: 关键词匹配不到?

A: 检查关键词是否有空格差异,尝试不同变体

Q: 报告数量不全?

A: 东方财富只保留最近4年报告,更早的报告需要其他渠道

Q: 网络请求失败?

A: 添加延时time.sleep(1-2),避免被限流

输出文件

  • reports_{基金代码}/ - 原始报告文件
  • {基金名称}_投资策略汇总.txt - 完整汇总

依赖库

pip install akshare pymupdf pdfplumber pandas requests

Created: 2026-03-08 Author: 有才

Comments

Loading comments...