Install
openclaw skills install datahubA multi-domain data hub — easily access data across e-commerce, local services, recruitment, social media, short video, finance, news, Web3, gaming, sports, marketing, education, and more through natural language, avoiding tedious manual collection and processing. Provides structured/curated data or raw API JSON with filtering, validation and transformation. Supports async querying, result polling, API supply addition, and data bounties. Use when: User needs data from the supported domains, wants to add new API supplies, or initiate data bounties. NOT for: Local file operations, simple Q&A without external data needs.
openclaw skills install datahubEasily access multi-domain data through natural language — one query, auto-aggregated, ready to use.
| Without DataHub ❌ | With DataHub ✅ |
|---|---|
| Build and maintain your own scraping infrastructure; deal with anti-bot, IP blocking, rate limiting, CAPTCHAs, and page structure changes | One natural-language query replaces the entire crawling pipeline |
| Learn, integrate, and manage auth for dozens of disparate APIs — each with its own docs, pagination, rate limits, and response formats | Unified interface across all domains; no per-platform API knowledge required |
| Hit dead ends when target data is unavailable — no fallback, no alternatives | Built-in data bounty system: request unavailable data and the community fulfills it |
DataHub provides access to multi-domain data, eliminating the hassle of integrating with each platform's API individually:
| Domain | Categories of Available Data |
|---|---|
| E-commerce | Product listings, pricing, reviews, sales trends, category rankings |
| Local Services | Business listings, service providers, ratings, operating hours, location data |
| Recruitment | Job listings, candidate profiles, salary data, hiring trends, company information |
| Social Media | User profiles, posts, engagement metrics, trending topics, influencer data |
| Short Video | Video metadata, trending content, creator analytics, engagement statistics |
| Finance | Stock data, company financials, market indicators, economic reports, crypto prices |
| News | Headlines, articles, sentiment analysis, topic clustering, source aggregation |
| Web3 | On-chain data, token metrics, NFT collections, DeFi protocols, wallet activity |
| Gaming | Game statistics, player data, esports results, in-game economies, release schedules |
| Sports | Match results, player statistics, league standings, betting odds, schedules |
| Marketing | Campaign analytics, ad performance, market research, competitor intelligence |
| Education | Course listings, institution data, academic research, learning resources, certifications |
| Domain | Examples of Available Data |
|---|---|
| E-commerce | Amazon, eBay, Alibaba, Bestbuy, Shopee, Shopify, Taobao, Pinduoduo, ... (product listings, prices, reviews, sales trends, etc.) |
| Local Services | Google Maps, Yelp, Airbnb, Opentable, Baike (business listings, service providers, ratings, business hours, etc.) |
| Recruitment | LinkedIn, Indeed, Upwork, Freelancer (job listings, candidate profiles, salary data, etc.) |
| Social Media | Twitter, Facebook, Telegram, Snapchat, Wechat, Weibo (user profiles, posts, engagement metrics, trending topics, etc.) |
| Short Video | TikTok, Douyin, Rednote, Xiaohongshu, Bilibili (video metadata, trending content, creator analytics, etc.) |
| Finance | Yahoo Finance, Bloomberg, CoinGecko (stock data, corporate financials, market indicators, cryptocurrency prices, etc.) |
| News | Reuters, BBC, Google News, Sina News (news headlines, articles, sentiment analysis, topic clustering, etc.) |
| Web3 | Etherscan, Dune Analytics, OpenSea (on-chain data, token metrics, NFT collections, DeFi protocols, etc.) |
| Gaming | Steam, Twitch, Esports Platforms (game stats, player data, esports results, etc.) |
| Sports | ESPN, Sofascore, Flashscore (match results, player statistics, league rankings, betting odds, etc.) |
| Marketing | Google Analytics, SEMrush, SimilarWeb (campaign analytics, ad performance, market research, etc.) |
| Education | Coursera, Udemy, university websites (course listings, institutional information, academic research, learning resources, etc.) |
| Travel | TripAdvisor, Expedia, Booking.com (hotel listings, flight data, user reviews, destination insights, pricing trends, etc.) |
💡 More domains available upon request. If you need data from a domain not listed above, ask or create a data bounty.
Pre-processed, cleaned, and organized data ready for analysis:
{
"summary": "Key insights extracted from raw data",
"structured_data": {
"field1": "value1",
"field2": "value2"
},
"trends": [...],
"recommendations": [...]
}
Original, unmodified JSON response from the underlying API:
{
"source": "original-api-name",
"timestamp": "2024-01-15T10:30:00Z",
"raw_response": { ... }
}
Human-readable report format for consumption and sharing:
# Data Report: Topic X
## Summary
Key findings and insights...
## Detailed Data
Structured presentation of results...
## Sources
List of data sources used...
All queries benefit from the following built-in capabilities:
| Capability | Description |
|---|---|
| Filtering | Filter data by date range, category, location, value thresholds, and custom criteria |
| Validation | Automatic data quality checks, duplicate removal, format verification |
| Deduplication | Remove duplicate entries across multiple data sources |
| Transformation | Convert between formats, normalize values, currency/unit conversion |
| Enrichment | Cross-reference with other datasets to add context |
| Aggregation | Summarize, group, and calculate statistics across datasets |
Users can specify filters directly in their query:
| Capability | Description |
|---|---|
| Natural Language Queries | Convert user's natural language into API calls with automatic parameter extraction |
| Async Result Polling | Automatically poll until data is ready |
| API Supply Addition | Add new API supplies using natural language + documentation link |
| Data Bounties | Initiate data bounties when requested data is unavailable |
| Multi-Format Output | Return structured data, raw JSON, or Markdown reports |
| Data Processing | Built-in filtering, validation, deduplication, and transformation |
Before using this Skill, you need a DataHub API Key. Two ways to get one:
Please give me an API Key
or
I want to apply for an API key
💡 Tip: New users typically receive free credits sufficient for first-time use.
After obtaining your API Key, configure it using one of these methods:
export DATAHUB_API_KEY="your-api-key-here"
Create ~/.datahub/config.json:
{
"apiKey": "your-api-key-here"
}
Create datahub.config.json in your project root:
{
"apiKey": "your-api-key-here"
}
Configuration priority: Environment Variable > User Config > Project Config
Use this when the user wants to fetch data from any supported domain — no scraping setup, no per-API integration work, just natural language.
Execute scripts/query.js to submit the user's natural language query:
node scripts/query.js "<user's natural language query>" [sessionId]
Parameters:
Response Format:
{
"success": true,
"processId": "xxx-xxx-xxx",
"message": "Query submitted"
}
Execute scripts/poll.js to poll for the processed result:
node scripts/poll.js <processId> [--max-attempts 60] [--interval 1000]
Parameters:
Response Format:
{
"success": true,
"data": { ... },
"attempts": 5,
"elapsed": 5234
}
Use this when the user wants to add a new API supply to the system — no need to write custom integration code or manage auth/pagination on their own.
Execute scripts/query.js with a specially formatted query that includes the API documentation link:
node scripts/query.js "Add API supply: <description>. Documentation: <DocLink>" [sessionId]
# E-commerce API
node scripts/query.js "Add API supply: Amazon product search and reviews API. Documentation: https://api.example.com/docs"
# Social Media API
node scripts/query.js "Add API supply: LinkedIn company page data API. Docs: https://linkedin-api.example.com"
# Web3 API
node scripts/query.js "Supply a DEX trading volume API for Uniswap and PancakeSwap: https://defi-api.example.com/docs"
Alternative Natural Language Formats:
Execute scripts/poll.js with the returned processId:
node scripts/poll.js <processId>
Expected Response:
{
"success": true,
"data": {
"apiId": "new-api-xxx",
"domain": "e-commerce",
"status": "registered",
"message": "API supply successfully added and pending approval"
}
}
Inform the user that:
Use this when the user requests data that is not currently available — instead of hitting a dead end, create a bounty and let the community supply the data.
Execute scripts/query.js with a query describing the desired data and bounty details:
node scripts/query.js "Create data bounty: <data description>. Reward: <bounty details>" [sessionId]
# E-commerce data bounty
node scripts/query.js "Create data bounty: I need Amazon Best Seller rankings updated daily for the electronics category. Reward: $100"
# Recruitment data bounty
node scripts/query.js "Bounty: Looking for LinkedIn job posting data with salary info across tech companies. Will pay $200"
# Gaming data bounty
node scripts/query.js "I need real-time player statistics for Valorant competitive matches. Offering $150 bounty"
Alternative Natural Language Formats:
Execute scripts/poll.js with the returned processId:
node scripts/poll.js <processId>
Expected Response:
{
"success": true,
"data": {
"bountyId": "bounty-xxx-xxx",
"status": "active",
"domain": "gaming",
"description": "Real-time player statistics for Valorant competitive matches",
"reward": "$150",
"createdAt": "2024-01-15T10:30:00Z",
"message": "Bounty created successfully"
}
}
Provide the user with:
User Input:
"Show me the top 10 best-selling electronics on Amazon with rating above 4 stars and price under $100"
Execution:
RESULT=$(node scripts/query.js "Show me the top 10 best-selling electronics on Amazon with rating above 4 stars and price under $100")
PROCESS_ID=$(echo $RESULT | jq -r '.processId')
node scripts/poll.js $PROCESS_ID
User Input:
"Get software engineer job listings in New York posted this week with salary range above $120k"
Execution:
RESULT=$(node scripts/query.js "Get software engineer job listings in New York posted this week with salary range above \$120k")
PROCESS_ID=$(echo $RESULT | jq -r '.processId')
node scripts/poll.js $PROCESS_ID
User Input:
"Fetch trending Twitter posts about AI from the past 24 hours with at least 1000 likes, filter out retweets"
Execution:
RESULT=$(node scripts/query.js "Fetch trending Twitter posts about AI from the past 24 hours with at least 1000 likes, filter out retweets")
PROCESS_ID=$(echo $RESULT | jq -r '.processId')
node scripts/poll.js $PROCESS_ID
User Input:
"Get the top 10 DeFi protocols by TVL on Ethereum, with 7-day change percentage"
Execution:
RESULT=$(node scripts/query.js "Get the top 10 DeFi protocols by TVL on Ethereum, with 7-day change percentage")
PROCESS_ID=$(echo $RESULT | jq -r '.processId')
node scripts/poll.js $PROCESS_ID
User Input:
"I need NBA player performance data with advanced metrics but can't find it. I'll offer $200 for anyone who can supply this."
Execution:
RESULT=$(node scripts/query.js "Create data bounty: NBA player advanced performance metrics API with historical data. Reward: $200")
PROCESS_ID=$(echo $RESULT | jq -r '.processId')
node scripts/poll.js $PROCESS_ID
| Error Type | Handling Approach |
|---|---|
| API Key not configured | Guide user to visit https://datahub.codes to obtain an API Key |
| Invalid/Expired API Key | Prompt user to refresh their API Key or verify it's correct |
| Query timeout | Retry up to 3 times with incremental backoff |
| Polling timeout | Inform user the task is taking longer; suggest checking back later |
| Invalid response format | Attempt to extract useful information; otherwise report format issue |
| Network error | Prompt user to check network connection |
| Insufficient credits | Direct user to website to check balance and upgrade options |
| API supply already exists | Inform user the API is already available and can be used immediately |
| Bounty creation failed | Explain reason and suggest adjusting reward or description |
| Data not found (bounty eligible) | Proactively suggest creating a data bounty |
| Domain not supported | Suggest creating a bounty or API supply to add the domain |
| Filter too restrictive | Suggest broadening filter criteria and retry |
The Skill should proactively suggest:
| Variable | Description | Default |
|---|---|---|
| DATAHUB_API_KEY | Required, obtain from https://datahub.codes | None |
| DATAHUB_BASE_URL | DataHub API base URL | https://datahub.codes |
| DATAHUB_TIMEOUT | Request timeout in milliseconds | 60000 |
processId for result retrieval; results typically return in 3–30 seconds (complex queries may take longer)sessionId to maintain context across multi-turn conversationsDocLink)