Bosszp
v1.0.0BOSS直聘岗位数据爬虫 + 可视化分析工具。使用 Scrapy 爬取职位数据,存入 MySQL,用 Flask + Highcharts 生成可视化报告。当用户需要批量采集 BOSS直聘 职位数据、分析薪资分布、公司分布、岗位要求,或需要生成可视化报告时触发。
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The SKILL.md describes a Scrapy -> MySQL -> Flask pipeline which is coherent with the named purpose (BOSS直聘 数据爬虫 + 可视化). However, the skill package contains no code files (spiders, clean_data.py, web/run.py) yet the instructions reference them; that makes the runtime expectations incomplete. Overall capabilities align with the stated purpose but are under-specified.
Instruction Scope
Instructions tell the user to create/run a Scrapy project, run a spider against zhipin.com, and run external scripts (clean_data.py, run.py) that are not bundled. The steps instruct installing packages and running network I/O (scraping) but do not describe how database credentials are provided or what clean_data.py does. This is open-ended and could lead users to run or create code without clear safety or credential-handling guidance.
Install Mechanism
Although there is no formal install spec, the SKILL.md includes pip install commands. Most are expected (scrapy, flask, pandas, sqlalchemy, pymysql), but there is a likely-typo or unexpected command: 'pip install flash' (instead of 'flask'). Installing an unexpected PyPI package named 'flash' could pull an unrelated or malicious package. Any instructions that blindly pip install third-party packages without verifying versions/source are a risk.
Credentials
The skill requests no environment variables in metadata, yet the workflow requires MySQL access for importing data. The SKILL.md does not specify how DB credentials should be provided (env vars, config file, interactive input). Requiring database access in practice is proportional to the purpose, but the omission of explicit credential handling is a gap and could lead users to supply credentials insecurely.
Persistence & Privilege
The skill is instruction-only and has no install spec, does not request always:true, and does not claim to modify other skills or persist in the agent. No elevated persistence or privileges are requested.
What to consider before installing
This skill appears to be an instructions-only guide to build a Scrapy->MySQL->Flask pipeline for scraping BOSS直聘. Before you proceed: 1) Do not run pip install commands blindly — verify packages and versions. The SKILL.md has 'pip install flash' which is likely a typo for 'flask' and could install an unrelated PyPI package; correct and verify before installing. 2) The repository contains no spider code, no clean_data.py and no web/run.py — you'll need to implement or inspect those scripts; do not run third-party scripts you didn't review. 3) The workflow requires MySQL credentials; decide how to provide them securely (use least-privilege DB user, avoid embedding credentials in public files). 4) Web scraping external sites can violate terms of service and legal/regulatory rules; check robots.txt, site terms, and rate-limit to avoid blocking. 5) Run in an isolated environment (virtualenv/container) and audit installed packages before use. If you want a safer evaluation, request the actual spider and data-cleaning scripts so they can be reviewed for unwanted network/file/credential access.Like a lobster shell, security has layers — review code before you run it.
latest
BOSS直聘 数据爬虫分析 (bosszp)
环境要求
- Python 3.x
- MySQL
- Scrapy
- Pandas + SQLAlchemy
- Flask + Highcharts
安装依赖
pip install scrapy
pip install flask pandas sqlalchemy pymysql
pip install flash
数据采集
1. 创建 Scrapy 项目
scrapy startproject bosszp
cd bosszp
scrapy genspider boss zhipin.com
2. 配置 Cookie 和请求头
编辑 settings.py:
COOKIES_ENABLED = True
USER_AGENTS = [
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...',
...
]
# 爬取延迟(避免被封)
DOWNLOAD_DELAY = 2
3. 运行爬虫
scrapy crawl boss -o output.csv
数据导出
CSV 格式
岗位名称,岗位地区,薪资,公司名称,公司类型,公司规模,融资阶段,工作年限,学历,福利标签
导入 MySQL
python clean_data.py # 数据清洗 + 导入 MySQL
可视化看板
cd web
export FLASK_APP=run.py
flask run
# 访问 http://127.0.0.1:8080/
看板包含:
- 饼图: 公司融资阶段分布
- 柱状图: 薪资分布 Top10
- 词云: 岗位福利标签词云
- 表格: 公司排名 Top10(按岗位数量)
字段说明
| 字段 | 说明 |
|---|---|
| job_name | 岗位名称 |
| job_area | 工作地区 |
| job_salary | 薪资范围 |
| com_name | 公司名称 |
| com_type | 公司类型(国企/民营等) |
| com_size | 公司规模 |
| finance_stage | 融资阶段 |
| work_year | 工作年限要求 |
| education | 学历要求 |
| job_benefits | 福利标签 |
Comments
Loading comments...
