Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Anti-Crawler Evasion

v1.0.0

防止被反爬虫机制识别和封禁。当用户需要进行网页爬取、数据采集、API访问,或询问如何绕过反爬、避免IP封禁、隐藏爬虫身份时使用。触发词包括:反爬、反爬虫、绕过反爬、避免封禁、爬虫伪装、隐身爬取。

0· 165·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for plover061/anti-crawler-evasion.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Anti-Crawler Evasion" (plover061/anti-crawler-evasion) from ClawHub.
Skill page: https://clawhub.ai/plover061/anti-crawler-evasion
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install anti-crawler-evasion

ClawHub CLI

Package manager switcher

npx clawhub@latest install anti-crawler-evasion
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
high confidence
!
Purpose & Capability
The name and description match the content (anti-crawler evasion). However, the SKILL.md contains code that requires browser automation (Selenium/Playwright), proxy credentials, and third‑party CAPTCHA services—yet the skill declares no required binaries, env vars, or config paths. Legitimately using these techniques would normally require drivers, installed packages, and API/proxy credentials; the absence of those declarations is incoherent.
!
Instruction Scope
The runtime instructions go beyond simple guidance: they include code to import browser cookies, create headless browsers and stealth drivers, simulate mouse/keyboard events, rotate proxies with credentials, and call external CAPTCHA solving APIs. Importing browser cookies or running local input-simulation libraries implies access to sensitive local data and devices. The skill does not limit or document how such local access should be obtained or consented to.
Install Mechanism
The skill is instruction-only (no install spec), which lowers direct install risk. However, the code snippets implicitly require installing Python packages (requests, selenium, playwright, selenium_stealth, mouse/keyboard libraries), browser drivers, and possibly system-level dependencies. The absence of an explicit install spec or declared binaries is a mismatch that could lead users to run ad-hoc installs or copy-paste unsafe commands.
!
Credentials
The skill declares no required environment variables or credentials, but examples use proxy credentials (user:pass@...), and a placeholder API key for 2Captcha. It also recommends importing browser cookies (session tokens). Requesting or using those secrets without declaring them is disproportionate and increases risk of credential exposure or misuse.
Persistence & Privilege
always:false (good) and autonomous invocation is allowed (platform default). Autonomous invocation combined with instructions that access local cookies, proxies, or external CAPTCHA services increases the blast radius if the agent is allowed to act without human review—consider restricting autonomous invocation or adding explicit prompts/consent before performing sensitive actions.
What to consider before installing
This skill provides step-by-step techniques to evade anti-scraping defenses (proxies, browser fingerprinting, cookie import, CAPTCHA solving). Before installing or running it, consider: (1) legal and ethical risk—bypassing site protections can violate terms of service and laws; (2) do not supply browser cookies, proxy credentials, or API keys to an untrusted skill—those can expose your sessions and accounts; (3) the skill omits required binaries and env vars, so you would need to install packages and drivers yourself—avoid copy/pasting unknown install commands; (4) prefer using sanctioned APIs or obtaining permission from target sites; (5) if you must test, run in an isolated sandbox with no real credentials and disable autonomous invocation so the agent cannot act without your explicit approval. Ask the publisher for a clear list of required packages, exact env vars, and a justification for any operation that reads local cookies or system devices before proceeding.

Like a lobster shell, security has layers — review code before you run it.

latestvk97eh0hr8rnwtdf6nw2qknevw984gejd
165downloads
0stars
1versions
Updated 2w ago
v1.0.0
MIT-0

Anti-Crawler Evasion Skill

核心策略概览

1. 请求伪装策略

User-Agent轮换

import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
]

def get_random_ua():
    return random.choice(USER_AGENTS)

请求头完整性

HEADERS = {
    "User-Agent": get_random_ua(),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Cache-Control": "max-age=0"
}

2. IP轮换方案

代理池策略

import requests

PROXY_POOL = [
    {"http": "http://user:pass@proxy1.example.com:8080"},
    {"http": "http://user:pass@proxy2.example.com:8080"},
    {"http": "http://user:pass@proxy3.example.com:8080"}
]

def get_random_proxy():
    return random.choice(PROXY_POOL)

def fetch_with_proxy(url):
    proxy = get_random_proxy()
    response = requests.get(url, proxies=proxy, headers=HEADERS)
    return response

代理类型选择

代理类型匿名度适用场景成本
住宅代理高级反爬绕过
数据中心代理常规爬取
旋转代理大规模采集
免费代理测试/演示

3. 访问频率控制

import time
import random

class RateLimiter:
    def __init__(self, min_delay=3, max_delay=10):
        self.min_delay = min_delay
        self.max_delay = max_delay
        self.last_request = 0

    def wait(self):
        delay = random.uniform(self.min_delay, self.max_delay)
        time.sleep(delay)

    def adaptive_wait(self, response):
        """根据响应状态自适应调整延迟"""
        if response.status_code == 429:
            time.sleep(60)  # 遇到限流,等待更长时间
        elif response.status_code == 200:
            # 成功请求后稍微增加延迟,降低被封风险
            delay = random.uniform(self.min_delay, self.max_delay) * 1.5
            time.sleep(delay)

4. 浏览器指纹规避

指纹随机化

from selenium import webdriver
from selenium_stealth import stealth

def create_stealth_driver():
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-blink-features=AutomationControlled')

    driver = webdriver.Chrome(options=options)

    stealth(driver,
        languages=["en-US", "en", "zh-CN", "zh"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
    )

    return driver

5. Cookie和会话管理

import requests
from http.cookiejar import CookieJar

session = requests.Session()

# 保持Cookie持久化
session.cookies = CookieJar()

# 从真实浏览器导入Cookie
def import_browser_cookies(session, browser="chrome"):
    """从浏览器导入Cookie以通过初验"""
    # 实现细节根据目标浏览器而定
    pass

6. JavaScript挑战绕过

使用Playwright/Selenium处理JS渲染

from playwright.sync_api import sync_playwright

def scrape_dynamic_page(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            user_agent=get_random_ua(),
            viewport={'width': 1920, 'height': 1080}
        )
        page = context.new_page()

        # 模拟人类行为
        page.goto(url)
        page.mouse.wheel(0, 500)  # 模拟滚动
        page.wait_for_timeout(2000)  # 随机等待

        content = page.content()
        browser.close()
        return content

7. 验证码处理策略

第三方验证码服务

# 2Captcha API集成
import requests

def solve_captcha(site_key, page_url):
    """使用2Captcha解决验证码"""
    api_key = "YOUR_API_KEY"

    # 提交验证码
    submit_url = f"http://2captcha.com/in.php?key={api_key}&method=userrecaptcha&googlekey={site_key}&pageurl={page_url}"
    resp = requests.get(submit_url)
    captcha_id = resp.text.split('|')[1]

    # 等待结果
    for _ in range(30):
        time.sleep(5)
        result_url = f"http://2captcha.com/res.php?key={api_key}&action=get&id={captcha_id}"
        result = requests.get(result_url)
        if result.text.startswith('OK'):
            return result.text.split('|')[1]

    return None

高级规避技术

分布式爬取架构

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Master    │────▶│   Workers   │────▶│  Proxy Pool │
│   Server    │     │   (多个)     │     │  (轮换IP)   │
└─────────────┘     └─────────────┘     └─────────────┘
       │                   │                   │
       └───────────────────┴───────────────────┘
                           │
                    ┌──────▼──────┐
                    │  Task Queue │
                    │  (Redis)    │
                    └─────────────┘

行为模拟

from mouse import move, click
from keyboard import write, press
import random
import time

def human_behavior_simulation(driver, element):
    """模拟人类行为操作元素"""
    rect = element.rect

    # 随机偏移模拟鼠标移动
    x = rect['x'] + random.randint(5, 20)
    y = rect['y'] + random.randint(5, 20)

    # 鼠标移动(不直线路径)
    move_in_human_pattern(x, y)

    # 随机延迟
    time.sleep(random.uniform(0.1, 0.3))

    # 点击
    click(x, y)

def move_in_human_pattern(target_x, target_y):
    """模拟人类鼠标移动路径"""
    # 添加随机中间点
    current_x, current_y = get_current_mouse_position()
    points = generate_human_path(current_x, current_y, target_x, target_y)

    for x, y in points:
        move(x, y)
        time.sleep(random.uniform(0.01, 0.03))

检测规避清单

检测类型规避方法优先级
IP频率检测代理轮换 + 延迟
User-Agent检测UA轮换池
Cookie/Session检测真实浏览器Cookie
行为模式检测人性化操作模拟
浏览器指纹检测Stealth模式
JavaScript检测使用真实浏览器
验证码第三方识别服务按需

最佳实践

  1. 渐进式部署:从低频率开始,逐步调整策略
  2. 监控响应:密切关注HTTP状态码和响应时间
  3. 备用方案:准备多个数据源,避免单点依赖
  4. 遵守规则:优先遵守robots.txt和网站条款
  5. 日志记录:详细记录请求和响应,便于问题排查

常见反爬绕过场景

场景1: IP封禁

  • 症状: HTTP 403/429
  • 解决: 接入代理池,降低请求频率

场景2: 验证码拦截

  • 症状: 页面出现验证码
  • 解决: 验证码识别服务或手动处理

场景3: JavaScript渲染

  • 症状: 页面内容为空或加密
  • 解决: 使用Playwright/Selenium

场景4: 行为分析拦截

  • 症状: 无明显错误但数据异常
  • 解决: 添加人性化行为模拟

工具推荐

工具用途场景
Playwright浏览器自动化JS渲染页面
Selenium + Stealth浏览器伪装需要登录的页面
ScraperAPI云端代理服务快速集成
Crawlera智能代理池企业级应用
2Captcha验证码识别验证码拦截

Comments

Loading comments...