Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

pepper-oil-scraper

v1.0.0

爬取花椒油、藤椒油产业链相关数据的专用技能。覆盖市场规模、原料价格、企业财报、 进出口、行业报告、竞争格局等多维度数据源,内置 20+ 重点网站的爬虫适配器。 当用户需要采集花椒/藤椒/调味油/Sichuan Pepper Oil 相关产业数据时触发此技能。 即使用户只说"爬数据""抓取报告""采集价格",只要上...

0· 119·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for majorlau/pepper-oil-scraper.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "pepper-oil-scraper" (majorlau/pepper-oil-scraper) from ClawHub.
Skill page: https://clawhub.ai/majorlau/pepper-oil-scraper
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: python3, pip
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install pepper-oil-scraper

ClawHub CLI

Package manager switcher

npx clawhub@latest install pepper-oil-scraper
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description claim a Python-based scraper for pepper/pepper-oil industry data from ~20 sites; the package contains many Python scraper adapters, a main scheduler, data-cleaner and export tools and a config.targets.json listing sites — all consistent with the stated purpose.
Instruction Scope
SKILL.md gives concrete pip install commands and examples to run the Python scripts. The runtime instructions only describe scraping public websites and producing local JSON/XLSX outputs; they do not instruct reading unrelated system files or sending data to hidden endpoints. They do suggest optional proxy and playwright usage for JS sites.
!
Install Mechanism
The metadata.install entry uses kind: "node" (id: pip-deps) despite the project being pure Python and the SKILL.md instructing pip installs. This mismatch looks like a packaging/metadata error and may mean the declared install step won't run as intended. The SKILL.md itself asks the user to run pip install (no external archive downloads or obscure URLs), so risk is low but the metadata inconsistency should be fixed or the user should manually install Python deps in a venv.
Credentials
The skill requests no environment variables or credentials. The code reads only its included config/targets.json and writes outputs to the configured output directory. No credentials or unrelated secrets are requested or referenced in the provided files.
Persistence & Privilege
Flags are default (always:false, user-invocable:true). The skill does not request persistent platform privileges or modify other skills/configs. It runs local Python scripts and saves results to the filesystem only.
What to consider before installing
This package appears to be a legitimate multi-site Python web scraper for pepper/pepper-oil industry data, but review these before installing: - Install step mismatch: the skill metadata lists an install of kind "node" while the project is Python. Don't rely on the metadata installer — create a Python virtual environment and run the SKILL.md pip install command yourself (pip install requests beautifulsoup4 lxml pandas openpyxl aiohttp fake-useragent). Avoid using --break-system-packages unless you understand its effects. - Inspect config/targets.json before running: it can contain site list, proxy settings, company lists or HS codes; make sure no private or unexpected endpoints/proxies are configured. - Run in an isolated environment (container or VM) and as a non-root user to limit blast radius in case of mistakes. - Legal/ethical caution: the tool is a crawler. Verify target sites' robots.txt and terms of service and avoid scraping pages that require authentication or are behind paywalls. The code includes anti-anti-crawl techniques (fake-useragent, proxy support, delay/backoff) — be careful and lawful when using them. - Confirm optional dependencies: JS-heavy sites mention playwright — that requires separate installation and can be large; only install if needed. If you want, I can (a) list the exact packages and versions to install in a virtualenv, (b) parse config/targets.json for suspicious proxy/URL entries, or (c) point out any adapters that mention downloading PDFs or interacting with APIs so you can audit those specific behaviors.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🌶️ Clawdis
Binspython3, pip

Install

Install Python dependencies
latestvk97f26875p7ntfwgnt4kf26fq9839s6c
119downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

花椒油/藤椒油产业数据爬虫技能

概述

本技能提供一套完整的 Python 爬虫工具集,用于从 20+ 个重点数据源采集花椒/藤椒产业链数据。

快速开始

# 1. 安装依赖
pip install requests beautifulsoup4 lxml pandas openpyxl aiohttp fake-useragent --break-system-packages

# 2. 运行采集
python scripts/main_crawler.py --all --output /home/claude/pepper_data/

# 3. 按类别采集
python scripts/main_crawler.py --category price    # 原料价格
python scripts/main_crawler.py --category market   # 行业报告
python scripts/main_crawler.py --category company  # 企业数据
python scripts/main_crawler.py --category gov      # 政府数据
python scripts/main_crawler.py --category media    # 媒体报道
python scripts/main_crawler.py --category global   # 全球市场

# 4. 单站点采集
python scripts/main_crawler.py --site cnhnb        # 惠农网价格
python scripts/main_crawler.py --site cnfin_index  # 新华花椒指数

# 5. 导出报告
python scripts/export_report.py --input /home/claude/pepper_data/ --output /mnt/user-data/outputs/花椒产业数据.xlsx

重点数据源(26 个站点)

A. 原料价格与供需(5 站)

站点域名adapter_id采集内容
惠农网cnhnb.comcnhnb花椒/藤椒实时批发价、历史价格走势
一亩田ymt.comymt产地收购价、供应商报价
新华花椒价格指数indices.cnfin.comcnfin_index武都花椒价格指数(日/周/月)
花椒大数据网860938.cnhuajiao_bigdata各产区价格、种植面积、产量
中国花椒网huajiao.cnhuajiao_cn花椒行情、产区动态

B. 行业研究报告(6 站)

站点域名adapter_id采集内容
观研天下chinabaogao.comchinabaogao花椒油/藤椒油行业报告摘要
中商产业研究院askci.comaskci市场规模预测、竞争格局
智研咨询chyxx.comchyxx行业深度报告、产量数据
前瞻产业研究院qianzhan.comqianzhan行业趋势、市场前景
中研网chinairn.comchinairn花椒油市场规模、增长率
共研网gonyn.comgonyn产业链分析、市场预测

C. 企业与财报(4 站)

站点域名adapter_id采集内容
巨潮资讯网cninfo.com.cncninfo招股书、年报(幺麻子/天味/颐海等)
东方财富eastmoney.comeastmoney财务数据、研报
新浪财经finance.sina.com.cnsina_finance企业新闻、财报解读
导油网oilcn.comoilcn食用油行业动态

D. 政府与标准(4 站)

站点域名adapter_id采集内容
国家林草局forestry.gov.cnforestry花椒种植面积、产量、政策
农业农村部moa.gov.cnmoa农产品市场信息
海关总署customs.gov.cncustoms进出口数据 (HS:0910991000)
标准全文公开系统openstd.samr.gov.cnsamr_std花椒油国标/行标

E. 财经媒体(4 站)

站点域名adapter_id采集内容
36氪36kr.comkr36企业分析、融资动态
界面新闻jiemian.comjiemian行业深度报道
CBNDatacbndata.comcbndata消费数据、企业分析
央广网cnr.cncnr花椒产业研究报告

F. 全球市场(3 站)

站点域名adapter_id采集内容
Business Research Insightsbusinessresearchinsights.combri全球 Prickly Ash Oil 市场
Verified Market Reportsverifiedmarketreports.comvmr全球花椒油预测
WiseGuy Reportswiseguyreports.comwiseguy四川风味全球市场

架构

pepper-oil-scraper/
├── SKILL.md
├── config/
│   └── targets.json             # 全部站点配置
├── scripts/
│   ├── main_crawler.py          # 主调度入口
│   ├── base_scraper.py          # 基类:反爬、重试、限速
│   ├── adapters/
│   │   ├── __init__.py          # 适配器注册表
│   │   ├── price_adapters.py    # A 组:价格站点
│   │   ├── report_adapters.py   # B 组:报告站点
│   │   ├── company_adapters.py  # C 组:企业站点
│   │   ├── gov_adapters.py      # D 组:政府站点
│   │   ├── media_adapters.py    # E 组:媒体站点
│   │   └── global_adapters.py   # F 组:全球站点
│   ├── data_cleaner.py          # 数据清洗与标准化
│   └── export_report.py         # 导出 Excel 报告
├── references/
│   └── anti_crawl_guide.md      # 反爬策略参考
└── templates/
    └── report_template.md       # 报告输出模板

反爬策略

  • 请求间隔 2-5 秒随机延迟(configurable per site)
  • fake-useragent 随机 UA
  • 带 Referer 头模拟正常浏览
  • 403/429 指数退避(2s → 4s → 8s → ... → 60s max)
  • 支持代理池配置
  • 对 JS 重站点说明使用 playwright(需用户手动安装)

数据标准化

所有输出统一单位:价格→元/公斤,面积→万亩,产量→万吨,金额→亿元。 每条数据必须携带 source_url、crawl_time、original_text 字段。

Comments

Loading comments...