WrynAI Skill

Perform advanced web crawling and content extraction with multi-page crawling, search result parsing, pattern filtering, and screenshot capture using the Wry...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 992 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
The SKILL.md clearly describes a WrynAI web-crawling skill (crawl, search, screenshots, content extraction) which is coherent with the skill name, but the package metadata gives no description or homepage and lists no required environment variables or primary credential. That mismatch (no declared env vars vs SKILL.md requiring WRYNAI_API_KEY) is unexpected and reduces trust in the metadata.
Instruction Scope
The instructions are focused on crawling/searching tasks and do not instruct the agent to access unrelated files or credentials. They do, however, tell the agent to read an environment variable (WRYNAI_API_KEY) and to pip install the 'wrynai' package — both are within the expected scope for a third-party SDK but the env var is not declared in the registry metadata.
Install Mechanism
There is no formal install spec in the registry; the SKILL.md instructs users to run 'pip install wrynai'. Instruction-only skills are lower risk, but pip-installing an external package can execute arbitrary code at install time. There is no link to a PyPI package or authoritative repo in the metadata to verify the package identity.
!
Credentials
The SKILL.md requires a single API key (WRYNAI_API_KEY), which is proportionate for a hosted SDK. However the declared requirements list in the registry is empty (no required env vars, no primary credential). That inconsistency is a red flag — the skill will need an API key to function, but the registry does not advertise that requirement.
Persistence & Privilege
The skill is instruction-only, does not request 'always: true', and does not ask to modify system or other skill configurations. It does not request elevated persistence or platform-wide privileges.
What to consider before installing
This skill appears to be a client wrapper for the WrynAI web-crawling service and legitimately needs an API key and the WrynAI Python package. Before installing or providing an API key: 1) Verify the WrynAI service (https://wryn.ai) and the identity of the 'wrynai' PyPI package or repository (confirm package author, homepage, and release history on PyPI or GitHub). 2) Note the registry metadata omission: the skill metadata does not declare the WRYNAI_API_KEY requirement — treat that as a potential oversight and ask the publisher to correct it. 3) Remember that pip installs run code at install time; only install packages from trusted sources. 4) Consider the data-flow risk: crawled content will be sent to the WrynAI service (the API key will authenticate those requests). If you will crawl sites with sensitive content, verify the service's privacy and retention policies. 5) If you need stronger assurance, request the skill publisher add a homepage, explicit required env vars/primary credential in metadata, and a formal install spec (or provide the SDK code) so you can audit the exact behavior.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk977v5gam2xzc6zaek6067c0kh80pchw

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

WrynAI Web Crawling Skill

Overview

This skill enables OpenClaw to perform advanced web crawling and content extraction using the WrynAI SDK. It provides capabilities for multi-page crawling, content extraction, search engine results parsing, and intelligent data gathering from websites.

Core Capabilities

  • Multi-page crawling with depth and breadth control
  • Content extraction (text, markdown, structured data, links)
  • Search engine results parsing (SERP data)
  • Screenshot capture (viewport and full-page)
  • Smart listing extraction (e-commerce, directory pages)
  • Pattern-based URL filtering for targeted crawling

Prerequisites

Environment Setup

# Install the WrynAI SDK
pip install wrynai

# Set your API key as environment variable
export WRYNAI_API_KEY="your-api-key-here"

API Key

Sign up at https://wryn.ai to obtain an API key. The key must be set in the WRYNAI_API_KEY environment variable.

Usage Patterns

1. Basic Website Crawling

Use this when the user wants to crawl an entire website or section of a website.

import os
from wrynai import WrynAI, WrynAIError

def crawl_website(url: str, max_pages: int = 10) -> dict:
    """
    Crawl a website starting from the given URL.
    
    Args:
        url: Starting URL for the crawl
        max_pages: Maximum number of pages to crawl (hard limit: 10)
    
    Returns:
        Dictionary containing crawl results with pages and their content
    """
    api_key = os.environ.get("WRYNAI_API_KEY")
    if not api_key:
        raise ValueError("WRYNAI_API_KEY environment variable required")
    
    try:
        with WrynAI(api_key=api_key) as client:
            result = client.crawl(
                url=url,
                max_pages=min(max_pages, 10),  # Hard limit enforced
                max_depth=3,
                return_urls=True,
            )
            
            return {
                "success": result.success,
                "total_pages": result.total_pages,
                "total_visited": result.total_visited,
                "pages": [
                    {
                        "url": page.page_url,
                        "content": page.content,
                        "urls_found": len(page.urls),
                        "discovered_urls": page.urls[:10],  # First 10 URLs
                    }
                    for page in result.pages
                ],
            }
    except WrynAIError as e:
        return {
            "success": False,
            "error": str(e),
            "status_code": getattr(e, 'status_code', None),
        }

When to use:

  • User asks to "crawl a website"
  • User wants to gather content from multiple pages
  • User needs to discover site structure

2. Documentation Crawling

Specialized crawling for documentation sites with pattern filtering.

from wrynai import WrynAI, Engine

def crawl_documentation(base_url: str, doc_patterns: list = None) -> list:
    """
    Crawl documentation sites with targeted URL patterns.
    
    Args:
        base_url: Base URL of the documentation site
        doc_patterns: List of URL patterns to include (e.g., ["/docs/", "/api/"])
    
    Returns:
        List of crawled documentation pages with content
    """
    api_key = os.environ.get("WRYNAI_API_KEY")
    doc_patterns = doc_patterns or ["/docs/", "/guide/", "/api/", "/reference/"]
    
    with WrynAI(api_key=api_key) as client:
        result = client.crawl(
            url=base_url,
            max_pages=10,
            max_depth=3,
            include_patterns=doc_patterns,
            exclude_patterns=["/internal/", "/draft/", "/changelog/", "/admin/"],
            return_urls=True,
            timeout_ms=60000,  # 60 seconds for documentation crawling
        )
        
        return [
            {
                "url": page.page_url,
                "content": page.content,
                "word_count": len(page.content.split()),
            }
            for page in result.pages
        ]

When to use:

  • User needs to extract documentation content
  • User wants to crawl specific sections of a site
  • User needs to build a knowledge base from docs

3. Search + Crawl Pipeline

Search for topics and crawl the top results.

from wrynai import WrynAI, CountryCode, WrynAIError
import time

def search_and_crawl(query: str, num_sites: int = 3, country: str = "US") -> list:
    """
    Search for a query and crawl the top results.
    
    Args:
        query: Search query
        num_sites: Number of top results to crawl
        country: Country code for search localization
    
    Returns:
        List of search results with crawled content
    """
    api_key = os.environ.get("WRYNAI_API_KEY")
    
    with WrynAI(api_key=api_key) as client:
        # Step 1: Perform search
        try:
            search_result = client.search(
                query=query,
                num_results=num_sites,
                country_code=getattr(CountryCode, country, CountryCode.US),
                timeout_ms=120000,
            )
        except WrynAIError as e:
            return [{"error": f"Search failed: {str(e)}"}]
        
        # Step 2: Crawl each result
        results = []
        for result in search_result.organic_results[:num_sites]:
            try:
                crawl_result = client.crawl(
                    url=result.url,
                    max_pages=3,
                    max_depth=1,
                    timeout_ms=60000,
                )
                
                results.append({
                    "search_position": result.position,
                    "title": result.title,
                    "url": result.url,
                    "snippet": result.snippet,
                    "crawled_pages": [
                        {
                            "url": page.page_url,
                            "content_preview": page.content[:500],
                            "full_content": page.content,
                        }
                        for page in crawl_result.pages
                    ],
                })
                
                # Rate limiting courtesy
                time.sleep(1)
                
            except WrynAIError as e:
                results.append({
                    "title": result.title,
                    "url": result.url,
                    "error": str(e),
                })
        
        return results

When to use:

  • User wants to research a topic comprehensively
  • User needs content from top search results
  • User wants to compare information across multiple sources

4. Content Extraction Only

Extract specific content types without crawling.

from wrynai import WrynAI, Engine

def extract_page_content(url: str, content_type: str = "text") -> dict:
    """
    Extract specific content from a single page.
    
    Args:
        url: Target URL
        content_type: Type of content to extract 
                     ("text", "markdown", "structured", "links", "title")
    
    Returns:
        Dictionary with extracted content
    """
    api_key = os.environ.get("WRYNAI_API_KEY")
    
    with WrynAI(api_key=api_key) as client:
        try:
            if content_type == "text":
                result = client.extract_text(url, extract_main_content=True)
                return {"url": url, "text": result.text}
            
            elif content_type == "markdown":
                result = client.extract_markdown(url, extract_main_content=True)
                return {"url": url, "markdown": result.markdown}
            
            elif content_type == "structured":
                result = client.extract_structured_text(url)
                return {
                    "url": url,
                    "main_text": result.main_text,
                    "headings": [
                        {"level": h.level, "tag": h.tag, "text": h.text}
                        for h in result.headings
                    ],
                    "links": [
                        {"text": l.text, "url": l.url, "internal": l.internal}
                        for l in result.links
                    ],
                }
            
            elif content_type == "links":
                result = client.extract_links(url)
                return {
                    "url": url,
                    "links": [
                        {"text": l.text, "url": l.url, "internal": l.internal}
                        for l in result.links
                    ],
                }
            
            elif content_type == "title":
                result = client.extract_title(url)
                return {"url": url, "title": result.title}
            
            else:
                return {"error": f"Unknown content_type: {content_type}"}
                
        except WrynAIError as e:
            return {"url": url, "error": str(e)}

When to use:

  • User needs specific content from a single page
  • User wants structured data extraction
  • User needs to extract links or headings

5. Robust Crawling with Error Handling

Production-ready crawling with retry logic and rate limit handling.

from wrynai import WrynAI, RateLimitError, TimeoutError, ServerError, WrynAIError
import time

def robust_crawl(url: str, max_attempts: int = 3, max_pages: int = 10) -> dict:
    """
    Crawl with automatic retry and error recovery.
    
    Args:
        url: Starting URL
        max_attempts: Maximum retry attempts
        max_pages: Maximum pages to crawl
    
    Returns:
        Crawl results with success status
    """
    api_key = os.environ.get("WRYNAI_API_KEY")
    
    with WrynAI(api_key=api_key, max_retries=3) as client:
        for attempt in range(max_attempts):
            try:
                result = client.crawl(
                    url=url,
                    max_pages=max_pages,
                    max_depth=3,
                    timeout_ms=60000,
                    retries=2,
                )
                
                return {
                    "success": True,
                    "attempt": attempt + 1,
                    "total_visited": result.total_visited,
                    "pages": [
                        {
                            "url": page.page_url,
                            "content_length": len(page.content),
                            "urls_found": len(page.urls),
                        }
                        for page in result.pages
                    ],
                }
            
            except RateLimitError as e:
                wait_time = e.retry_after or (2 ** attempt * 5)
                print(f"Rate limited. Waiting {wait_time}s before retry...")
                time.sleep(wait_time)
                continue
            
            except TimeoutError:
                print(f"Timeout on attempt {attempt + 1}. Retrying...")
                continue
            
            except ServerError as e:
                wait_time = 2 ** attempt
                print(f"Server error: {e}. Waiting {wait_time}s...")
                time.sleep(wait_time)
                continue
            
            except WrynAIError as e:
                return {
                    "success": False,
                    "error": str(e),
                    "error_type": type(e).__name__,
                    "attempt": attempt + 1,
                }
        
        return {
            "success": False,
            "error": "Maximum retry attempts exceeded",
            "attempts": max_attempts,
        }

When to use:

  • Production environments requiring reliability
  • Crawling sites with rate limits
  • When dealing with potentially unstable targets

6. JavaScript-Heavy Sites

For single-page applications and JavaScript-rendered content.

from wrynai import WrynAI, Engine

def crawl_spa(url: str, max_pages: int = 5) -> dict:
    """
    Crawl single-page applications or JavaScript-heavy sites.
    
    Args:
        url: Starting URL
        max_pages: Maximum pages to crawl
    
    Returns:
        Crawl results with rendered content
    """
    api_key = os.environ.get("WRYNAI_API_KEY")
    
    with WrynAI(api_key=api_key) as client:
        result = client.crawl(
            url=url,
            max_pages=max_pages,
            max_depth=2,
            engine=Engine.STEALTH_MODE,  # Use browser rendering
            timeout_ms=90000,  # Longer timeout for JS rendering
            return_urls=True,
        )
        
        return {
            "success": result.success,
            "total_visited": result.total_visited,
            "pages": [
                {
                    "url": page.page_url,
                    "content": page.content,
                    "urls_found": len(page.urls),
                }
                for page in result.pages
            ],
        }

When to use:

  • User needs to crawl React/Vue/Angular applications
  • Content is dynamically loaded via JavaScript
  • Anti-bot protection is present

Key Parameters & Configuration

Crawl Limits

# Hard limits enforced by the API
MAX_PAGES = 10      # Maximum pages per crawl
MAX_DEPTH = 3       # Maximum link depth

Engine Selection

Engine.SIMPLE         # Fast, for static HTML (default)
Engine.STEALTH_MODE   # Slower, for JavaScript-rendered content

Timeout Recommendations

# Simple scraping: 30,000 ms (30 seconds)
# Crawling: 60,000 ms (60 seconds) 
# Search operations: 120,000 ms (2 minutes)
# Smart extraction: 45,000 ms (45 seconds)

URL Pattern Filtering

# Common patterns for include_patterns
DOCS_PATTERNS = ["/docs/", "/guide/", "/api/", "/reference/"]
BLOG_PATTERNS = ["/blog/", "/posts/", "/articles/"]

# Common patterns for exclude_patterns
EXCLUDE_PATTERNS = ["/admin/", "/login/", "/draft/", "/internal/"]
MEDIA_EXCLUDE = [".pdf", ".jpg", ".png", ".mp4", ".zip"]

Error Handling

Exception Types

from wrynai import (
    WrynAIError,           # Base exception
    AuthenticationError,    # Invalid API key (401)
    BadRequestError,        # Invalid parameters (400)
    RateLimitError,         # Rate limit exceeded (429)
    TimeoutError,           # Request timeout
    ServerError,            # Server error (5xx)
    ConnectionError,        # Network issue
    ValidationError,        # Local validation error
)

Error Handling Pattern

try:
    result = client.crawl(url)
except AuthenticationError:
    # Check WRYNAI_API_KEY environment variable
    pass
except RateLimitError as e:
    # Wait for e.retry_after seconds
    time.sleep(e.retry_after or 60)
except TimeoutError:
    # Increase timeout_ms parameter
    pass
except WrynAIError as e:
    # General API error
    print(f"Error: {e} (status: {e.status_code})")

Best Practices

1. Always Use Environment Variables

import os
api_key = os.environ.get("WRYNAI_API_KEY")
if not api_key:
    raise ValueError("WRYNAI_API_KEY environment variable required")

2. Use Context Managers

# Recommended - automatic resource cleanup
with WrynAI(api_key=api_key) as client:
    result = client.crawl(url)

# Not recommended - manual cleanup required
client = WrynAI(api_key=api_key)
try:
    result = client.crawl(url)
finally:
    client.close()

3. Set Appropriate Timeouts

# For simple pages
timeout_ms=30000

# For crawling multiple pages
timeout_ms=60000

# For JavaScript-heavy sites
timeout_ms=90000

4. Graceful Degradation

try:
    # Try structured extraction first
    result = client.extract_structured_text(url)
    content = result.main_text
except Exception:
    try:
        # Fall back to simple text
        result = client.extract_text(url)
        content = result.text
    except Exception:
        content = None

5. Respect Rate Limits

import time

for url in urls:
    result = client.crawl(url)
    time.sleep(1)  # Be nice to the API

Advanced Features

Smart Listing Extraction (PRO)

Extract structured data from listing pages (e-commerce, directories).

def extract_product_listings(url: str) -> list:
    """Extract product information from listing pages."""
    api_key = os.environ.get("WRYNAI_API_KEY")
    
    with WrynAI(api_key=api_key) as client:
        result = client.auto_listing(
            url=url,
            engine=Engine.STEALTH_MODE,
            timeout_ms=60000,
        )
        
        return [
            {
                "title": item.get("title"),
                "price": item.get("price"),
                "rating": item.get("rating"),
                "url": item.get("url"),
            }
            for item in result.items
        ]

Screenshot Capture

import base64
from wrynai import ScreenshotType

def capture_page_screenshot(url: str, fullpage: bool = False) -> str:
    """Capture page screenshot and save to file."""
    api_key = os.environ.get("WRYNAI_API_KEY")
    
    with WrynAI(api_key=api_key) as client:
        result = client.take_screenshot(
            url=url,
            screenshot_type=ScreenshotType.FULLPAGE if fullpage else ScreenshotType.VIEWPORT,
            timeout_ms=30000,
        )
        
        # Decode and save
        image_data = result.screenshot
        if "," in image_data:
            image_data = image_data.split(",")[1]
        
        filename = "screenshot.png"
        with open(filename, "wb") as f:
            f.write(base64.b64decode(image_data))
        
        return filename

Common Use Cases

1. Competitive Research

"Search for [topic] and crawl the top 5 results"

2. Documentation Aggregation

"Crawl the Python documentation and extract all API references"

3. Content Migration

"Crawl our old website and extract all blog posts in markdown"

4. Link Analysis

"Find all external links on [website]"

5. Site Monitoring

"Crawl [site] and check if [content] is present"

6. Knowledge Base Creation

"Crawl [documentation site] and create a searchable knowledge base"

Limitations & Considerations

  1. Hard Limits: Maximum 10 pages per crawl, depth of 3
  2. Rate Limits: API has rate limits; handle RateLimitError appropriately
  3. Timeout Management: Adjust timeouts based on site complexity
  4. JavaScript Rendering: Use Engine.STEALTH_MODE for SPAs (slower but necessary)
  5. Robots.txt: SDK respects robots.txt; some pages may be blocked
  6. Dynamic Content: Some dynamically loaded content may require stealth mode

Troubleshooting

Common Issues

Issue: AuthenticationError

  • Solution: Verify WRYNAI_API_KEY environment variable is set correctly

Issue: RateLimitError

  • Solution: Implement retry with e.retry_after wait time

Issue: TimeoutError

  • Solution: Increase timeout_ms parameter

Issue: Empty content returned

  • Solution: Try Engine.STEALTH_MODE for JavaScript-rendered pages

Issue: Missing links/content

  • Solution: Check exclude_patterns and include_patterns configuration

Integration with OpenClaw

When using this skill with OpenClaw:

  1. Set environment variable before running:

    export WRYNAI_API_KEY="your-api-key"
    
  2. Install dependencies:

    pip install wrynai
    
  3. Use in your OpenClaw workflows:

    • Call the crawling functions directly from your automation scripts
    • Integrate with other OpenClaw skills for comprehensive data pipelines
    • Use the returned data structures in downstream processing

API Reference Quick Links

Version Information

  • Skill Version: 1.0.0
  • SDK Version: wrynai v1.0.0
  • Python Version: 3.8+
  • Last Updated: 2025-02-07

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…