Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Bosszp

v1.0.0

BOSS直聘岗位数据爬虫 + 可视化分析工具。使用 Scrapy 爬取职位数据,存入 MySQL,用 Flask + Highcharts 生成可视化报告。当用户需要批量采集 BOSS直聘 职位数据、分析薪资分布、公司分布、岗位要求,或需要生成可视化报告时触发。

0· 75·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The SKILL.md describes a Scrapy -> MySQL -> Flask pipeline which is coherent with the named purpose (BOSS直聘 数据爬虫 + 可视化). However, the skill package contains no code files (spiders, clean_data.py, web/run.py) yet the instructions reference them; that makes the runtime expectations incomplete. Overall capabilities align with the stated purpose but are under-specified.
!
Instruction Scope
Instructions tell the user to create/run a Scrapy project, run a spider against zhipin.com, and run external scripts (clean_data.py, run.py) that are not bundled. The steps instruct installing packages and running network I/O (scraping) but do not describe how database credentials are provided or what clean_data.py does. This is open-ended and could lead users to run or create code without clear safety or credential-handling guidance.
!
Install Mechanism
Although there is no formal install spec, the SKILL.md includes pip install commands. Most are expected (scrapy, flask, pandas, sqlalchemy, pymysql), but there is a likely-typo or unexpected command: 'pip install flash' (instead of 'flask'). Installing an unexpected PyPI package named 'flash' could pull an unrelated or malicious package. Any instructions that blindly pip install third-party packages without verifying versions/source are a risk.
!
Credentials
The skill requests no environment variables in metadata, yet the workflow requires MySQL access for importing data. The SKILL.md does not specify how DB credentials should be provided (env vars, config file, interactive input). Requiring database access in practice is proportional to the purpose, but the omission of explicit credential handling is a gap and could lead users to supply credentials insecurely.
Persistence & Privilege
The skill is instruction-only and has no install spec, does not request always:true, and does not claim to modify other skills or persist in the agent. No elevated persistence or privileges are requested.
What to consider before installing
This skill appears to be an instructions-only guide to build a Scrapy->MySQL->Flask pipeline for scraping BOSS直聘. Before you proceed: 1) Do not run pip install commands blindly — verify packages and versions. The SKILL.md has 'pip install flash' which is likely a typo for 'flask' and could install an unrelated PyPI package; correct and verify before installing. 2) The repository contains no spider code, no clean_data.py and no web/run.py — you'll need to implement or inspect those scripts; do not run third-party scripts you didn't review. 3) The workflow requires MySQL credentials; decide how to provide them securely (use least-privilege DB user, avoid embedding credentials in public files). 4) Web scraping external sites can violate terms of service and legal/regulatory rules; check robots.txt, site terms, and rate-limit to avoid blocking. 5) Run in an isolated environment (virtualenv/container) and audit installed packages before use. If you want a safer evaluation, request the actual spider and data-cleaning scripts so they can be reviewed for unwanted network/file/credential access.

Like a lobster shell, security has layers — review code before you run it.

latestvk976m8vxb4w4ntvbnfnzhmpssn83qd0y
75downloads
0stars
1versions
Updated 3w ago
v1.0.0
MIT-0

BOSS直聘 数据爬虫分析 (bosszp)

环境要求

  • Python 3.x
  • MySQL
  • Scrapy
  • Pandas + SQLAlchemy
  • Flask + Highcharts

安装依赖

pip install scrapy
pip install flask pandas sqlalchemy pymysql
pip install flash

数据采集

1. 创建 Scrapy 项目

scrapy startproject bosszp
cd bosszp
scrapy genspider boss zhipin.com

2. 配置 Cookie 和请求头

编辑 settings.py

COOKIES_ENABLED = True
USER_AGENTS = [
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...',
    ...
]

# 爬取延迟(避免被封)
DOWNLOAD_DELAY = 2

3. 运行爬虫

scrapy crawl boss -o output.csv

数据导出

CSV 格式

岗位名称,岗位地区,薪资,公司名称,公司类型,公司规模,融资阶段,工作年限,学历,福利标签

导入 MySQL

python clean_data.py   # 数据清洗 + 导入 MySQL

可视化看板

cd web
export FLASK_APP=run.py
flask run
# 访问 http://127.0.0.1:8080/

看板包含:

  • 饼图: 公司融资阶段分布
  • 柱状图: 薪资分布 Top10
  • 词云: 岗位福利标签词云
  • 表格: 公司排名 Top10(按岗位数量)

字段说明

字段说明
job_name岗位名称
job_area工作地区
job_salary薪资范围
com_name公司名称
com_type公司类型(国企/民营等)
com_size公司规模
finance_stage融资阶段
work_year工作年限要求
education学历要求
job_benefits福利标签

Comments

Loading comments...