Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Tabstack Extractor

v0.1.0

Extract structured data from websites using Tabstack API. Use when you need to scrape job listings, news articles, product pages, or any structured web content. Provides JSON schema-based extraction and clean markdown conversion. Requires TABSTACK_API_KEY environment variable.

0· 1.9k·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name, description, and included code (Python and curl wrappers) consistently target the Tabstack API (api.tabstack.ai). However the registry metadata lists no required environment variables or primary credential while the SKILL.md and all code require a TABSTACK_API_KEY — this mismatch is an important incoherence.
!
Instruction Scope
SKILL.md instructs use of a Babashka script 'scripts/tabstack.clj' (bb commands) and offers a configuration-file alternative, but the file manifest does not include tabstack.clj — only tabstack_api.py and tabstack_curl.sh are present. The runtime instructions therefore reference code that isn't bundled, granting the agent ambiguous discretion. Otherwise the instructions only read schema files and the TABSTACK_API_KEY and target api.tabstack.ai (no other external endpoints).
Install Mechanism
There is no formal install spec (instruction-only), which limits risk. SKILL.md recommends installing Babashka using a curl|bash command from a GitHub raw URL — that's a common but higher-risk install pattern (pipe-to-shell). The bundled code itself has no install/download steps and uses standard Python 'requests' and curl calls.
!
Credentials
All code and the SKILL.md expect a single TABSTACK_API_KEY, which is proportionate to the stated purpose. However the registry metadata does not declare this required env var or a primary credential — a mismatch that could confuse permission reviews or automation. No other secrets or unrelated env vars are requested.
Persistence & Privilege
The skill does not request always: true or other elevated persistence. It is user-invocable and allows normal autonomous invocation. It does not attempt to modify other skills or system configs.
What to consider before installing
What to check before installing: - Confirm the TABSTACK_API_KEY requirement: the SKILL.md and both included wrappers require TABSTACK_API_KEY, but the registry metadata doesn't list it. Only provide an API key from a trusted Tabstack account and give it the minimum scope required. - Inspect missing files: the docs instruct running bb scripts/tabstack.clj, but tabstack.clj is not present in the bundle. Ask the publisher why the referenced Babashka script is missing or obtain the correct bundle before running commands. - Avoid running curl | bash blindly: the quick-start suggests installing Babashka via a curl-based install script. Prefer installing Babashka from your OS package manager or review the install script first. - Review the shipped scripts yourself: the provided Python and bash wrappers post only to https://api.tabstack.ai/v1 and read local schema files; verify there are no hidden endpoints or credential leaks before use. - Test with non-sensitive URLs/data first and verify network traffic (or run in an isolated environment) to ensure behavior matches expectations. Confidence notes: assessment is medium confidence because the code present matches the stated purpose, but the missing referenced script and registry metadata mismatch are unresolved ambiguities. If the publisher provides the missing tabstack.clj or updates registry metadata to declare TABSTACK_API_KEY, this would reduce concern.

Like a lobster shell, security has layers — review code before you run it.

latestvk977jfsvn858q7bfs1738jsd0s808ggm
1.9kdownloads
0stars
1versions
Updated 2h ago
v0.1.0
MIT-0

Tabstack Extractor

Overview

This skill enables structured data extraction from websites using the Tabstack API. It's ideal for web scraping tasks where you need consistent, schema-based data extraction from job boards, news sites, product pages, or any structured content.

Quick Start

1. Install Babashka (if needed)

# Option A: From GitHub (recommended for sharing)
curl -s https://raw.githubusercontent.com/babashka/babashka/master/install | bash

# Option B: From Nix
nix-shell -p babashka

# Option C: From Homebrew
brew install borkdude/brew/babashka

2. Set up API Key

Option A: Environment variable (recommended)

export TABSTACK_API_KEY="your_api_key_here"

Option B: Configuration file

mkdir -p ~/.config/tabstack
echo '{:api-key "your_api_key_here"}' > ~/.config/tabstack/config.edn

Get an API key: Sign up at Tabstack Console

3. Test Connection

bb scripts/tabstack.clj test

4. Extract Markdown (Simple)

bb scripts/tabstack.clj markdown "https://example.com"

5. Extract JSON (Start Simple)

# Start with simple schema (fast, reliable)
bb scripts/tabstack.clj json "https://example.com" references/simple_article.json

# Try more complex schemas (may be slower)
bb scripts/tabstack.clj json "https://news.site" references/news_schema.json

6. Advanced Features

# Extract with retry logic (3 retries, 1s delay)
bb scripts/tabstack.clj json-retry "https://example.com" references/simple_article.json

# Extract with caching (24-hour cache)
bb scripts/tabstack.clj json-cache "https://example.com" references/simple_article.json

# Batch extract from URLs file
echo "https://example.com" > urls.txt
echo "https://example.org" >> urls.txt
bb scripts/tabstack.clj batch urls.txt references/simple_article.json

Core Capabilities

1. Markdown Extraction

Extract clean, readable markdown from any webpage. Useful for content analysis, summarization, or archiving.

When to use: When you need the textual content of a page without the HTML clutter.

Example use cases:

  • Extract article content for summarization
  • Archive webpage content
  • Analyze blog post content

2. JSON Schema Extraction

Extract structured data using JSON schemas. Define exactly what data you want and get it in a consistent format.

When to use: When scraping job listings, product pages, news articles, or any structured data.

Example use cases:

  • Scrape job listings from BuiltIn/LinkedIn
  • Extract product details from e-commerce sites
  • Gather news articles with consistent metadata

3. Schema Templates

Pre-built schemas for common scraping tasks. See references/ directory for templates.

Available schemas:

  • Job listing schema (see references/job_schema.json)
  • News article schema
  • Product page schema
  • Contact information schema

Workflow: Job Scraping Example

Follow this workflow to scrape job listings:

  1. Identify target sites - BuiltIn, LinkedIn, company career pages
  2. Choose or create schema - Use references/job_schema.json or customize
  3. Test extraction - Run a single page to verify schema works
  4. Scale up - Process multiple URLs
  5. Store results - Save to database or file

Example job schema:

{
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "company": {"type": "string"},
    "location": {"type": "string"},
    "description": {"type": "string"},
    "salary": {"type": "string"},
    "apply_url": {"type": "string"},
    "posted_date": {"type": "string"},
    "requirements": {"type": "array", "items": {"type": "string"}}
  }
}

Integration with Other Skills

Combine with Web Search

  1. Use web_search to find relevant URLs
  2. Use Tabstack to extract structured data from those URLs
  3. Store results in Datalevin (future skill)

Combine with Browser Automation

  1. Use browser tool to navigate complex sites
  2. Extract page URLs
  3. Use Tabstack for structured extraction

Error Handling

Common issues and solutions:

  1. Authentication failed - Check TABSTACK_API_KEY environment variable
  2. Invalid URL - Ensure URL is accessible and correct
  3. Schema mismatch - Adjust schema to match page structure
  4. Rate limiting - Add delays between requests

Resources

scripts/

  • tabstack.clj - Main API wrapper in Babashka (recommended, has retry logic, caching, batch processing)
  • tabstack_curl.sh - Bash/curl fallback (simple, no dependencies)
  • tabstack_api.py - Python API wrapper (requires requests module)

references/

  • job_schema.json - Template schema for job listings
  • api_reference.md - Tabstack API documentation

Best Practices

  1. Start small - Test with single pages before scaling
  2. Respect robots.txt - Check site scraping policies
  3. Add delays - Avoid overwhelming target sites
  4. Validate schemas - Test schemas on sample pages
  5. Handle errors gracefully - Implement retry logic for failed requests

Teaching Focus: How to Create Schemas

This skill is designed to teach agents how to use Tabstack API effectively. The key is learning to create appropriate JSON schemas for different websites.

Learning Path

  1. Start Simple - Use references/simple_article.json (4 basic fields)
  2. Test Extensively - Try schemas on multiple page types
  3. Iterate - Add fields based on what the page actually contains
  4. Optimize - Remove unnecessary fields for speed

See Schema Creation Guide for detailed instructions and examples.

Common Mistakes to Avoid

  • Over-complex schemas - Start with 2-3 fields, not 20
  • Missing fields - Don't require fields that don't exist on the page
  • No testing - Always test with example.com first, then target sites
  • Ignoring timeouts - Complex schemas take longer (45s timeout)

Babashka Advantages

Using Babashka for this skill provides:

  1. Single binary - Easy to share/install (GitHub releases, brew, nix)
  2. Fast startup - No JVM warmup, ~50ms startup time
  3. Built-in HTTP client - No external dependencies
  4. Clojure syntax - Familiar to you (Wes), expressive
  5. Retry logic & caching - Built into the skill
  6. Batch processing - Parallel extraction for multiple URLs

Example User Requests

For this skill to trigger:

  • "Scrape job listings from Docker careers page"
  • "Extract the main content from this article"
  • "Get structured product data from this e-commerce page"
  • "Pull all the news articles from this site"
  • "Extract contact information from this company page"
  • "Batch extract job listings from these 20 URLs"
  • "Get cached results for this page (avoid API calls)"

Comments

Loading comments...