Bosszp

v1.0.0

BOSS直聘岗位数据爬虫 + 可视化分析工具。使用 Scrapy 爬取职位数据，存入 MySQL，用 Flask + Highcharts 生成可视化报告。当用户需要批量采集 BOSS直聘职位数据、分析薪资分布、公司分布、岗位要求，或需要生成可视化报告时触发。

⭐ 0· 75·0 current·0 all-time

by@aitowerofbabel-lang

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The SKILL.md describes a Scrapy -> MySQL -> Flask pipeline which is coherent with the named purpose (BOSS直聘数据爬虫 + 可视化). However, the skill package contains no code files (spiders, clean_data.py, web/run.py) yet the instructions reference them; that makes the runtime expectations incomplete. Overall capabilities align with the stated purpose but are under-specified.

Instruction Scope

Instructions tell the user to create/run a Scrapy project, run a spider against zhipin.com, and run external scripts (clean_data.py, run.py) that are not bundled. The steps instruct installing packages and running network I/O (scraping) but do not describe how database credentials are provided or what clean_data.py does. This is open-ended and could lead users to run or create code without clear safety or credential-handling guidance.

Install Mechanism

Although there is no formal install spec, the SKILL.md includes pip install commands. Most are expected (scrapy, flask, pandas, sqlalchemy, pymysql), but there is a likely-typo or unexpected command: 'pip install flash' (instead of 'flask'). Installing an unexpected PyPI package named 'flash' could pull an unrelated or malicious package. Any instructions that blindly pip install third-party packages without verifying versions/source are a risk.

Credentials

The skill requests no environment variables in metadata, yet the workflow requires MySQL access for importing data. The SKILL.md does not specify how DB credentials should be provided (env vars, config file, interactive input). Requiring database access in practice is proportional to the purpose, but the omission of explicit credential handling is a gap and could lead users to supply credentials insecurely.

✓

Persistence & Privilege

The skill is instruction-only and has no install spec, does not request always:true, and does not claim to modify other skills or persist in the agent. No elevated persistence or privileges are requested.

What to consider before installing

This skill appears to be an instructions-only guide to build a Scrapy->MySQL->Flask pipeline for scraping BOSS直聘. Before you proceed: 1) Do not run pip install commands blindly — verify packages and versions. The SKILL.md has 'pip install flash' which is likely a typo for 'flask' and could install an unrelated PyPI package; correct and verify before installing. 2) The repository contains no spider code, no clean_data.py and no web/run.py — you'll need to implement or inspect those scripts; do not run third-party scripts you didn't review. 3) The workflow requires MySQL credentials; decide how to provide them securely (use least-privilege DB user, avoid embedding credentials in public files). 4) Web scraping external sites can violate terms of service and legal/regulatory rules; check robots.txt, site terms, and rate-limit to avoid blocking. 5) Run in an isolated environment (virtualenv/container) and audit installed packages before use. If you want a safer evaluation, request the actual spider and data-cleaning scripts so they can be reviewed for unwanted network/file/credential access.

Like a lobster shell, security has layers — review code before you run it.

latestvk976m8vxb4w4ntvbnfnzhmpssn83qd0y

75downloads

0stars

1versions

Updated 3w ago

v1.0.0

MIT-0

BOSS直聘数据爬虫分析 (bosszp)

环境要求

Python 3.x
MySQL
Scrapy
Pandas + SQLAlchemy
Flask + Highcharts

安装依赖

pip install scrapy
pip install flask pandas sqlalchemy pymysql
pip install flash

数据采集

1. 创建 Scrapy 项目

scrapy startproject bosszp
cd bosszp
scrapy genspider boss zhipin.com

2. 配置 Cookie 和请求头

编辑 settings.py：

COOKIES_ENABLED = True
USER_AGENTS = [
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...',
    ...
]

# 爬取延迟（避免被封）
DOWNLOAD_DELAY = 2

3. 运行爬虫

scrapy crawl boss -o output.csv

数据导出

CSV 格式

岗位名称,岗位地区,薪资,公司名称,公司类型,公司规模,融资阶段,工作年限,学历,福利标签

导入 MySQL

python clean_data.py   # 数据清洗 + 导入 MySQL

可视化看板

cd web
export FLASK_APP=run.py
flask run
# 访问 http://127.0.0.1:8080/

看板包含：

饼图: 公司融资阶段分布
柱状图: 薪资分布 Top10
词云: 岗位福利标签词云
表格: 公司排名 Top10（按岗位数量）

字段说明

字段	说明
job_name	岗位名称
job_area	工作地区
job_salary	薪资范围
com_name	公司名称
com_type	公司类型（国企/民营等）
com_size	公司规模
finance_stage	融资阶段
work_year	工作年限要求
education	学历要求
job_benefits	福利标签

Comments

Loading comments...