Install
openclaw skills install @wangm-a3/datacrawl-debugUse when user needs to process web data, debug data collection code, clean processed data, or iterate on data processing strategies. Use when generating data processing code from URL and field descriptions. Use when diagnosing data processing errors like 403, timeout, selector failures, encoding issues. Use when cleaning, deduplicating, normalizing, and formatting processed data. Use when optimizing data processing strategies based on run history analysis. Use when user mentions "数据处理", "数据整理", "数据清洗", "数据代码", "数据调试", "data processing", "data extraction", "debug data".
openclaw skills install @wangm-a3/datacrawl-debug处理得了·修得好·洗得净·跑得稳
数据处理的"急诊室+健身房"——出了问题来急诊(DebugRunner),日常训练来健身(IterateOptimizer),全程配营养师(DataCleaner)。
scripts/process-engine.py config --url URL --fields 字段1 字段2 --mode static|dynamic|api
scripts/process-engine.py extract --html "HTML内容" --fields 字段1 字段2
scripts/code-generator.py --name 项目名 --url URL --fields 字段1 字段2 --mode requests_bs4|playwright|api_client
scripts/debug-runner.py --error "错误信息"
scripts/data-cleaner.py clean --input 数据 --remove-html --remove-duplicates
scripts/data-cleaner.py normalize --input 数据 --schema 类型定义
scripts/data-cleaner.py format --input 数据 --format json|csv|jsonl --fields 字段列表
scripts/iterate-optimizer.py analyze --input 运行历史.json
scripts/iterate-optimizer.py improve --config 当前配置 --analysis 分析结果
| 问题 | 建议方案 |
|---|---|
| 连接失败 | 检查URL有效性,添加重试机制 |
| 超时错误 | 增加超时时间,等待后重试 |
| 选择器失效 | 检查页面结构,更新选择器 |
| 编码问题 | 指定正确编码,使用容错解析 |
当目标站点使用 JavaScript 渲染内容时:
process-engine.py config → 了解目标站点+推荐方案code-generator.py → 获得起始代码模板debug-runner.py → 秒级诊断data-cleaner.py → 去重+标准化+格式化iterate-optimizer.py → 基于运行数据持续改进