Install
openclaw skills install vocational-ed-policy自动抓取教育部、人社部及各省教育厅官网的职业教育政策文件和课题申报信息。支持按关键词筛选和定期汇总。
openclaw skills install vocational-ed-policy自动抓取教育部、人社部及各省教育厅官网的职业教育政策文件、课题申报信息,支持按关键词筛选和定期汇总。
Automatically scrapes vocational education policy documents and project announcements from Ministry of Education, Ministry of Human Resources, and provincial education departments. Supports keyword filtering and periodic summaries.
支持的数据源 | Supported Sources:
抓取内容 | Content Types:
分类体系 | Classification System:
policy: 政策文件 (Policy Documents)project: 课题申报 (Project Applications)achievement: 教学成果奖 (Teaching Achievement Awards)integration: 产教融合 (Industry-Education Integration)certificate: 1+X证书 (1+X Certificates)double_high: 双高计划 (Double High Plan)筛选功能 | Filtering Capabilities:
汇总功能 | Summary Features:
中文示例 | Chinese Examples:
# 抓取最近30天的所有政策文件
python scripts/scrape_voc_ed_policy.py --days 30
# 按关键词筛选(双高计划、产教融合)
python scripts/scrape_voc_ed_policy.py --keywords "双高计划" "产教融合" --days 30
# 按类别筛选(仅政策文件)
python scripts/scrape_voc_ed_policy.py --category policy --days 7
# 综合筛选(多个关键词 + 类别 + 时间)
python scripts/scrape_voc_ed_policy.py --keywords "1+X证书" --category certificate --days 14
# 保存到指定文件
python scripts/scrape_voc_ed_policy.py --keywords "教学成果奖" --output results.json
English Examples:
# Scrape all policy documents from the last 30 days
python scripts/scrape_voc_ed_policy.py --days 30 --lang en
# Filter by keywords
python scripts/scrape_voc_ed_policy.py --keywords "双高计划" "产教融合" --days 30 --lang en
# Filter by category
python scripts/scrape_voc_ed_policy.py --category policy --days 7 --lang en
# Comprehensive filtering
python scripts/scrape_voc_ed_policy.py --keywords "1+X证书" --category certificate --days 14 --lang en
# Save to specified file
python scripts/scrape_voc_ed_policy.py --keywords "教学成果奖" --output results.json --lang en
| 参数 | 说明 | 示例 |
|---|---|---|
--keywords | 关键词列表 | --keywords "双高计划" "产教融合" |
--days | 回溯天数(默认30) | --days 7 |
--category | 筛选类别 | --category policy |
--output | 输出文件路径 | --output results.json |
--lang | 语言 (zh/en) | --lang zh |
中文: 明确需要抓取的内容类型、时间范围、关键词和类别。
English: Clarify the content type, time range, keywords, and category needed.
示例 | Example:
中文: 根据需求配置参数,运行抓取脚本。
English: Configure parameters based on requirements and run the scraping script.
python scripts/scrape_voc_ed_policy.py --keywords "双高计划" --category policy --days 30
中文: 抓取完成后,查看生成的JSON文件或终端输出摘要。
English: After scraping is complete, review the generated JSON file or terminal summary.
输出格式 | Output Format:
{
"websites_scraped": 3,
"total_documents": 45,
"results": [
{
"title": "教育部关于公布中国特色高水平高职学校和专业建设计划名单的通知",
"url": "https://www.moe.gov.cn/...",
"date": "2024-01-15",
"source": "教育部",
"category": "double_high",
"keywords": ["双高计划", "高职学校"]
}
],
"errors": [],
"timestamp": "2024-01-20T10:30:00",
"filters": {
"keywords": ["双高计划"],
"days": 30,
"category": "policy"
}
}
中文: 根据抓取结果进行分析,生成汇总报告。
English: Analyze the scraped results and generate summary reports.
中文: 使用cronjob设置定期抓取任务。
English: Use cronjob to set up scheduled scraping tasks.
# 每天早上8点抓取最近30天的政策文件
0 8 * * * python /path/to/scripts/scrape_voc_ed_policy.py --days 30 --output /path/to/results/daily_$(date +\%Y\%m\%d).json
# 每周一抓取最近7天的政策文件
0 8 * * 1 python /path/to/scripts/scrape_voc_ed_policy.py --days 7 --output /path/to/results/weekly_$(date +\%Y\%m\%d).json
中文: 在脚本中添加新的网站配置。
English: Add new website configurations in the script.
EDU_WEBSITES = {
"新增网站": {
"base_url": "https://example.gov.cn",
"policy_url": "https://example.gov.cn/policy/",
"selectors": {
"title": "a[title]",
"date": ".date",
"link": "a[href]"
},
"keywords": ["职业教育", "政策"]
}
}
中文: 将JSON结果转换为其他格式(CSV、Markdown、HTML)。
English: Convert JSON results to other formats (CSV, Markdown, HTML).
# 导出为CSV
import pandas as pd
df = pd.DataFrame(results['results'])
df.to_csv('results.csv', index=False, encoding='utf-8-sig')
# 导出为Markdown
def to_markdown(results):
md = "# 职业教育政策抓取结果\n\n"
for item in results['results']:
md += f"## {item['title']}\n"
md += f"- **来源**: {item['source']}\n"
md += f"- **日期**: {item['date']}\n"
md += f"- **链接**: {item['url']}\n\n"
return md
核心抓取脚本,支持:
实现详情: See references/implementation-notes.md for complete technical implementation notes, date parsing patterns, filtering logic, and ClawHub publishing workflow.
教育部、人社部及各省教育厅官网列表,包含:
网页抓取实现技术笔记,包含:
国际化辅助模块,支持:
翻译文件,包含:
中文:
voc-ed-policy → "Voc Ed Policy Scraper")displayName、title、display_name 字段目前不影响实际显示description 字段为完整中文描述slug - 系统生成的英文名称 + 完整的中文 descriptionEnglish:
voc-ed-policy → "Voc Ed Policy Scraper")displayName, title, and display_name fields currently do not affect actual displaydescription fieldslug - system-generated English name + complete Chinese description when installing中文:
voc-ed-policy、zhinao-vocational-policy | ❌ 职业教育政策、voc--ed-policyEnglish:
voc-ed-policy, zhinao-vocational-policy | ❌ 职业教育政策, voc--ed-policy中文: 发布流程:
cp -r ~/.hermes/skills/your-skill /mnt/c/Users/lenovo/Desktop/powershell.exe -Command "clawhub publish 'C:\Users\lenovo\Desktop\your-skill' --version X.X.X"rm -rf /mnt/c/Users/lenovo/Desktop/your-skill覆盖更新现有技能:
English: Publishing Workflow:
cp -r ~/.hermes/skills/your-skill /mnt/c/Users/lenovo/Desktop/powershell.exe -Command "clawhub publish 'C:\Users\lenovo\Desktop\your-skill' --version X.X.X"rm -rf /mnt/c/Users/lenovo/Desktop/your-skillUpdating Existing Skills:
中文:
time.sleep(1))English:
time.sleep(1))中文:
English:
中文:
English:
中文:
pip install requests beautifulsoup4
English:
pip install requests beautifulsoup4
中文: 脚本已添加过滤规则,自动排除:
使用 --keywords 参数进一步筛选:
python scrape_voc_ed_policy.py --keywords "双高计划" "产教融合"
English: Script includes built-in filters to exclude:
Use --keywords for additional filtering.
中文: ClawHub slug 必须符合以下规则:
示例:
voc-ed-policyzhinao-vocational-policy职业教育政策抓取(包含中文)voc--ed(连续连字符)English: ClawHub slug must:
Examples:
voc-ed-policyzhinao-vocational-policy职业教育政策抓取 (contains Chinese)voc--ed (consecutive hyphens)中文: ClawHub 系统会根据 slug 自动生成英文显示名称,displayName 字段不会影响显示。
解决方案:
description 字段使用中文完整描述voc-ed-policy)English: ClawHub auto-generates English display names from slug. displayName field does not affect display.
Workaround:
description fieldvoc-ed-policy)中文: WSL 不能直接运行 Windows 命令(如 powershell.exe),需要使用工作流程:
# 1. 复制技能到 Windows Desktop
cp -r ~/.hermes/skills/your-skill /mnt/c/Users/lenovo/Desktop/
# 2. 使用 PowerShell 发布
powershell.exe -Command "clawhub publish 'C:\Users\lenovo\Desktop\your-skill' --version 1.0.0"
# 3. 清理临时文件
rm -rf /mnt/c/Users/lenovo/Desktop/your-skill
注意:如果 slug 冲突,使用 --slug 参数指定或使用 clawhub skill rename。
English: WSL cannot run Windows commands directly. Use this workflow:
# 1. Copy skill to Windows Desktop
cp -r ~/.hermes/skills/your-skill /mnt/c/Users/lenovo/Desktop/
# 2. Publish using PowerShell
powershell.exe -Command "clawhub publish 'C:\Users\lenovo\Desktop\your-skill' --version 1.0.0"
# 3. Clean up temp files
rm -rf /mnt/c/Users/lenovo/Desktop/your-skill
Note: If slug conflict, use --slug parameter or clawhub skill rename.
中文: 使用相同的 slug 发布会提示冲突。正确的工作流程:
# 方式 1: 直接发布相同 slug(覆盖)
clawhub publish 'C:\Users\lenovo\Desktop\your-skill' --version 1.1.0
# 方式 2: 如果需要改名,先 rename 再发布
clawhub skill rename old-slug new-slug --yes
关键点:更新版本号,不要修改 name 字段的 slug。
English: Publishing with same slug triggers conflict. Correct workflow:
# Option 1: Publish with same slug (overwrite)
clawhub publish 'C:\Users\lenovo\Desktop\your-skill' --version 1.1.0
# Option 2: Rename if needed, then publish
clawhub skill rename old-slug new-slug --yes
Key point: Update version number, do not change slug in name field.
中文: 欢迎提交问题和改进建议。在提交PR之前,请确保:
English: Issues and improvement suggestions are welcome. Before submitting a PR, ensure:
版本: 1.0.0 | Version: 1.0.0 最后更新: 2024年 | Last Updated: 2024